It can be used only for statistical text mining (i. e. the bag-of-words model) and makes it very easy to create a term-document matrix from a collection of documents.
This term-document matrix can be read into a statistical package like R or MATLAB for further data analysis.
The module can write a term-document matrix to a CSV file, and also print the rows of the matrix to the screen.
textmining by Christian Peccei also provides some useful utilities for finding collocations (i. e. significant two-word phrases), computing the edit distance between words, and chunking long documents up into smaller pieces.
The package has a large amount of curated data (stopwords, common names, an English dictionary with parts of speech and word frequencies) which allows the user to extract fairly sophisticated features from a document.

• testmining does not have any natural language processing capabilities such as part-of-speech tagging.

