Monday, 7 October 2013

pialign - Phrasal ITG Aligner

Intro: pialign is a package that allows you to create a phrase table and word alignments from an unaligned parallel corpus. It is unlike other unsupervised word alignment tools in that it is able to create a phrase table using a fully statistical model, no heuristics. As a result, it is able to build phrase tables for phrase-based machine translation that achieve competitive results but are only a fraction of the size of those created with heuristic methods.

*** Note: pialign can extract very compact phrase table directly from unaligned parallel data. This is may be very helpful for SMT system in mobile environment.