Monday 5 January 2015

MTTK - Machine Translation Toolkit

Intro: MTTK is a collection of software tools for the alignment of parallel text for use in Statistical Machine Translation. With MTTK you can ...
  • Align document translation pairs at the sentence or sub-sentence level, sometimes known as chunking. This is a useful pre-processing step to prepare collections of translations for use in estimating the parameters of complex alignment models. Sub-sentence alignment in particular makes it possible to segment long sentences into shorter aligned segments that otherwise would have to be discarded.
  • Train statistical models for parallel text alignment.  The following models are supported : 
  • IBM Model-1 and Model-2
  • Word-to-Word HMMs  
  • Word-to-Phrase HMMs ,  with bigram translation probabilities 
  • Parallelize your model training procedures. If you have multiple CPUs available,  you can partition your translation training texts into subsets,  thus speeding up iterative parameter re-estimation procedures and reducing the amount of memory needed in training. This is done under exact EM-based parameter estimation procedures.
  • Generate word-to-word and word-to-phrase alignments of parallel text. MTTK can generate Viterbi alignments of parallel text (both training text and other texts) under the supported alignment models.
  • Extract word-to-word translation tables from aligned bitext and from the estimated models.
  • Extract phrase-to-phrase translation tables (phrase-pair inventories) from aligned parallel text.
  • Use the HMM alignment models to induce phrase translations under its statistical models.   Phrase-pair induction can generate richer inventories of phrase translations than can be extracted from Viterbi alignments.
  • Edit the C++ source code to implement your own estimation and alignment procedures.

No comments:

Post a Comment