(Extracting and Querying Relations in Scientiﬁc Papers on Language Technology)
hunalign aligns bilingual text on the sentence level. Its input is tokenized and sentence-segmented text in two languages. In the simplest case, its output is a sequence of bilingual sentence pairs (bisentences).
In the presence of a dictionary, hunalign uses it, combining this information with Gale-Church sentence-length information. In the absence of a dictionary, it first falls back to sentence-length information, and then builds an automatic dictionary based on this alignment. Then it realigns the text in a second pass, using the automatic dictionary.
Like most sentence aligners, hunalign does not deal with changes of sentence order: it is unable to come up with crossing alignments, i.e., segments A and B in one language corresponding to segments B’ A’ in the other language.
There is nothing Hungarian-specific in hunalign, the name simply reflects the fact that it is part of the hun* NLP toolchain.
hunalign was written in portable C++. It can be built under basically any kind of operating system.
YouAlign is powered by the AlignFactory engine, which supports all kinds of formats, including Microsoft Word, Excel and PowerPoint, PDF, HTML, XML, Corel WordPerfect, RTF, Lotus WordPro and plain text."