Data from web crawlers
From InterSciWiki
Contents |
[edit] SocSciBot open source
http://socscibot.wlv.ac.uk/ - Steve Franklin - not a recommendation, just an example
SocSciBot is a Web site crawler designed for research purposes. Together with its supporting programs SocSciBot Tools and Cyclist, it can be used to conduct link analysis on a site or collection of sites, or to run a search engine on a collection of sites. These programs can also be used in teaching, to illustrate how link analysis and search engines work.
Tutorials and extra information ncludes:
* To convert large link structure files to Pajek, Tobias Escher of University College London has supplied a special Perl program, available at
http://socscibot.wlv.ac.uk/socscibot2pajek_v1.0.zip for the download..
[edit] Visual web spider
$100 visual web crawler is a commercial packages selling in Feb 2008 for $75 (marked down).
Support Information:
- If you do not receive any emails from us within the next 5 minutes, or require assistance, please visit: http://www.plimus.com/assist
- To check the current processing status of your order please visit: http://plimus.com/jsp/order_locator_info.jsp?refId=07D3A71DFA296B22
- If you have any questions regarding Visual Web Spider, any product related concerns, please contact Newprosoft directly at support@newprosoft.com
[edit] Squzer open source
Squzer Distributed Crawler is written mostly in Python. It will be the official web crawler for Declum Search Engine.
It is listed in Open-source web-crawlers at Wikipedia:Web crawler
[edit] Websphinx open source
Yong Ming Kow found an open source web crawler in java, websphinx: A Personal, Customizable Web Crawler, which finds links between web sites, servers, and email. Only problem seems to be you can limit the save-data to just the links and nodes, but it wants to save, for example, the text contents of each of the sites.
