Intro: MTTK is a collection of software tools for the alignment of parallel text for use in Statistical Machine Translation. With MTTK you can ...
- Align document translation pairs at the sentence or sub-sentence level, sometimes known as chunking. This is a useful pre-processing step to prepare collections of translations for use in estimating the parameters of complex alignment models. Sub-sentence alignment in particular makes it possible to segment long sentences into shorter aligned segments that otherwise would have to be discarded.
- Train statistical models for parallel text alignment. The following models are supported :
- IBM Model-1 and Model-2
- Word-to-Word HMMs
- Word-to-Phrase HMMs , with bigram translation probabilities
- Parallelize your model training procedures. If you have multiple CPUs available, you can partition your translation training texts into subsets, thus speeding up iterative parameter re-estimation procedures and reducing the amount of memory needed in training. This is done under exact EM-based parameter estimation procedures.
- Generate word-to-word and word-to-phrase alignments of parallel text. MTTK can generate Viterbi alignments of parallel text (both training text and other texts) under the supported alignment models.
- Extract word-to-word translation tables from aligned bitext and from the estimated models.
- Extract phrase-to-phrase translation tables (phrase-pair inventories) from aligned parallel text.
- Use the HMM alignment models to induce phrase translations under its statistical models. Phrase-pair induction can generate richer inventories of phrase translations than can be extracted from Viterbi alignments.
- Edit the C++ source code to implement your own estimation and alignment procedures.
No comments:
Post a Comment