Data from web crawlers

From InterSciWiki

Jump to: navigation, search


Contents

[edit] SocSciBot open source

http://socscibot.wlv.ac.uk/ - Steve Franklin - not a recommendation, just an example

 SocSciBot is a Web site crawler designed for research purposes.
 Together with its supporting programs SocSciBot Tools and Cyclist, it
 can be used to conduct link analysis on a site or collection of
 sites, or to run a search engine on a collection of sites. These
 programs can also be used in teaching, to illustrate how link
 analysis and search engines work.

http://socscibot.wlv.ac.uk/

Tutorials and extra information ncludes:

   * To convert large link structure files to Pajek, Tobias Escher of University College London has supplied a special Perl program, available at  

http://socscibot.wlv.ac.uk/socscibot2pajek_v1.0.zip for the download..

[edit] Visual web spider

$100 visual web crawler is a commercial packages selling in Feb 2008 for $75 (marked down).

Support Information:

  • If you have any questions regarding Visual Web Spider, any product related concerns, please contact Newprosoft directly at support@newprosoft.com

WVs installation

[edit] Squzer open source

Squzer Distributed Crawler is written mostly in Python. It will be the official web crawler for Declum Search Engine.

It is listed in Open-source web-crawlers at Wikipedia:Web crawler

[edit] Websphinx open source

Yong Ming Kow found an open source web crawler in java, websphinx: A Personal, Customizable Web Crawler, which finds links between web sites, servers, and email. Only problem seems to be you can limit the save-data to just the links and nodes, but it wants to save, for example, the text contents of each of the sites.

Personal tools