Padhraic Smyth and Doug White are planning a Statistical topic model project for open access use of the software, expanding on existing prototype databases such as.

New York Times, 300,000 news articles
The Enron investigation, 250,000 emails
UCI and UCSD faculty specialties, 12,000 technical papers
Pennsylvania Gazette, 80,000 articles from the 18th century.
CiteSeerdigital collection, 750k papers, 500k authors,
MEDLINE collection, 17 million abstracts

Status: Michael Fischer has adapted some of Padhraic Smyth's techniques to his own code and has replicated and gone further with the technique, which will soon be available.