Wednesday, 3 August 2011

TER-Plus (TERp)


TERp is an automatic evaluation metric for Machine Translation, which takes as input a set of reference translations, and a set of machine translation output for that same data. It aligns the MT output to the reference translations, and measures the number of 'edits' needed to transform the MT output into the reference translation. TERp is an extension of TER (Translation Edit Rate) that utilizes phrasal substitutions (using automatically generated paraphrases), stemming, synonyms, relaxed shifting constraints and other improvements.

Open Source Machine Translation System Combination


MANY is an MT system combination software which architecture is described is the following picture :

The combination can be decomposed into three steps

  • 1-best hypotheses from all M systems are aligned in order to build M confusion networks (one for each system considered as backbone).
  • All CNs are connected into a single lattice. The first nodes of each CN are connected to a unique first node with probabilities equal to the priors probabilities assigned to the corresponding backbone. The final nodes are connected to a single final node with arc probability of one.
  • A token pass decoder is used along with a language model to decode the resulting lattice and the best hypothesis is generated.


System Combination for Machine Translation

This post is to collect papers regarding to system combination problem for Machine Translation systems. (collect everything first, filter later then)

1) Felipe Sánchez-Martínez. Choosing the best machine translation system to translate a sentence by using only source-language information. In Proceedings of the 15th Annual Conference of the European Associtation for Machine Translation, p. 97-104, May 30-31, 2011, Leuven, Belgium.

2) Víctor M. Sánchez-Cartagena, Felipe Sánchez-Martínez, Juan Antonio Pérez-Ortiz. Enriching a statistical machine translation system trained on small parallel corpora with rule-based bilingual phrases. In Proceedings of the International Conference on Recent Advances in Natural Language Processing, RANLP 2011, p ?-?, September 12-14, 2011, Hissar, Bulgaria (forthcoming)


Open Toolkit for Automatic MT (Meta-) Evaluation


Asiya has been designed to assist both system and metric developers by offering a rich repository of metrics and meta-metrics. Asiya has been developed at TALP Research Center NLP group , in Universitat Politècnica de Catalunya, as an evolution, extension, refactoring, and finally a replacement for its predecessor, IQMT.

Hybrid Example-based and Statistical MT System


Cunei is a hybrid platform for machine translation that draws upon the depth of research in Example-Based MT (EBMT) and Statistical MT (SMT). In particular, Cunei uses a data-driven approach that extends upon the basic thesis of EBMT--that some examples in the training data are of higher quality or are more relevant than others. Yet, it does so in a statistical manner, embracing much of the modeling pioneered by SMT, allowing for efficient optimization. Instead of using a static model for each phrase-pair, at run-time Cunei models each example of a phrase-pair in the corpus with respect to the input and combines them into dynamic collections of examples. Ultimately, this approach provides a more consistent model and a more flexible framework for integration of novel run-time features.

Stanford Biomedical Event Parser

David McClosky, Mihai Surdeanu, and Christopher D. Manning. 2011. Event Extraction as Dependency Parsing. In Proceedings of the Association for Computational Linguistics - Human Language Technologies 2011 Conference (ACL-HLT 2011). [PDF]