Statistical topic model project
- New York Times, 300,000 news articles
- The Enron investigation, 250,000 emails
- UCI and UCSD faculty specialties, 12,000 technical papers
- Pennsylvania Gazette, 80,000 articles from the 18th century.
- CiteSeerdigital collection, 750k papers, 500k authors,
- MEDLINE collection, 17 million abstracts
Status: Michael Fischer has adapted some of Padhraic Smyth's techniques to his own code and has replicated and gone further with the technique, which will soon be available.