Comparing pLSI and Non-negative Matrix Factorization
This is the homepage for Zack McCoy and Martin Blom for CS395T Data Mining: A Statistical Learning Perspective, Spring 2007.
We've completed our project and all the relevant files are below.
A quick note: All the code is in Python and requires scipy, numpy, and pylab, all of which you can download from www.scipy.org
The main code file is nmfplsa.py, and its main function is main(). It relies on the two files that actually implement the algorithms. The nmf alg is in the file nmf.py, and the plsa algorithms (both tempered and non-tempered) are here.
The data we used is available here. Once you save it to the same file where you have the Python files, everything should run appropriately.
The write-up is available in .pdf here, and the 4 pictures mentioned in the .pdf are available here, here,here, and here