Padhraic Smyth Department of Computer Science, UCI --
Social network database project
Statistical topic models
Padhraic Smyth is one of the leading researchers in statistical pattern detection and data mining, machine learning, and information theory. His book, Advances in Knowledge Discovery and Data Mining AAAI Press, 1996, was followed by the co-authored, Principles of Data Mining, MIT Press, 2001, and Modeling the Internet and the Web: Probabilistic Methods and Algorithms, John Wiley and Sons, 2003. He was a recipient of best paper awards at the 2002 and 1997 ACM SIGKDD Conferences, an IBM Faculty Partnership Award in 2001, an NSF Faculty CAREER award in 1997 and an Award for Excellence in Research at JPL in 1993, where he was a Technical Group Leader at the Jet Propulsion Laboratory, Pasadena. He received a first class honors degree in Electronic Engineering from University College Galway (National University of Ireland) in 1984, and the MSEE and PhD degrees from the Electrical Engineering Department at the California Institute of Technology in 1985 and 1988 respectively. He has been on the UCI faculty since 1996.
He is currently an associate editor for the Journal of the American Statistical Association and for the IEEE Transactions on Knowledge and Data Engineering, has served as an action editor for the Machine Learning Journal, is a founding associate editor for the Journal of Data Mining and Knowledge Discovery, and a founding editorial board member of the Journal of Machine Learning Research. He served as program chair for the 33rd Symposium on Computer Science and Statistics in 2001 and served as general chair for the Sixth International Workshop on AI and Statistics in 1997.
Statistical topic model pojwxr
- New York Times, 300,000 news articles
- The Enron investigation, 250,000 emails
- UCI and UCSD faculty specialties, 12,000 technical papers
- Pennsylvania Gazette, 80,000 articles from the 18th century.
- CiteSeerdigital collection, 750k papers, 500k authors,
- MEDLINE collection, 17 million abstracts
- US Patent collection
Michael Fischer's implementation
a simple example of what I have in Grok
Search the Anthropological Index Online http://grok.anthropology.ac.uk/grok/bin/view/AnthroGrok/AIOGraph
Search Paul Stirlings Fieldnotes (or Wenonah Lyon or Mike Fischer Summaries) http://grok.anthropology.ac.uk/grok/bin/view/AnthroGrok/Search+Fieldnotes
These don't display any statistics, just a kind of topic cloud for searching a text. This is all done in the browser, and the data stream coming in has the statistics, so we can customise to display them
The website looks like a search engine + stats topic modeling tool. First it searches the references using keywords. Then based on the results, it generates a list of keyword in 'bubbles.' Larger bubbles are most relevant.
Is my perception correct?