Sunday 29 May 2011

SALM: Suffix Array and its Applications in Empirical Language Processing

Link: http://projectile.sv.cmu.edu/research/public/tools/salm/salm.htm
Another customized version: https://github.com/jhclark/salm

SALM is C++ package that provides functions to locate and estimates statistics of n-grams in a large corpus. SALM toolkit provides example applications such as estimating type/token frequency, locating n-gram occurrences, and a suffix array language model that can have arbitrarily long history for a very large training corpus.

No comments:

Post a Comment