Thursday, 11 December 2014

Neural Machine Translation

Scientists around the world (especially Google guys) are moving the approaches of Statistical Machine Translation (SMT) (e.g. word-based, statistical with phrase-based or hierarchical, syntax-based) to the next level, namely Neural Machine Translation.

In general, Neural Machine Translation aims to simplify the SMT approaches by taking the source as an input sequence and producing the target as an output sequence via a single, large neural networks.

Here I am trying to catch up the recent progress of Neural Machine Translation.

*** People & Group

1) LISA Lab, University of Montreal 2014 led by Prof. Yoshua Bengio
Latest Demo: http://104.131.78.120/

2) Quoc Viet Le and co. at Google (e.g. Ilya Sutskever, Nal Kalchbrenner)

3) Phil Blunsom's group at Oxford Uni.

4) Dzmitry Bahdanau at Jacobs University Bremen

5) Richard Socher at Stanford Uni.

6) Kyunghyun Cho at NYU?

7) ...

*** Notable Papers
1a) Sequence to Sequence Learning with Neural Networks (Ilya Sutskever et al., NIPS 2014)
Note:
- The core idea behind neural machine translation.

1b) Generating Sequences With Recurrent Neural Networks (Alex Graves et al., ? 2014)
- TBA



4) Addressing the Rare Word Problem in Neural Machine Translation (Thang Luong et al., drafted version 2014)

5) On Using Monolingual Corpora in Neural Machine Translation (Caglar Gulcehre et al., arXiv 2015)

6) Ask Me Anything: Dynamic Memory Networks for Natural Language Processing (Richard Socher and co. at MetaMind, arXiv June 2015)
- MT result still not yet released!

7) Effective Approaches to Attention-based Neural Machine Translation (Thang Luong et al., EMNLP'15)

8) (to be updated)

In addition, some other approaches utilized neural processing to enhance the current state-of-the-art SMT framework, for example:

*** For Language Model:

Note:
- Resulting toolkit: NPLM ver 0.3 (http://nlg.isi.edu/software/nplm/)

Comments
- It is quite hard to choose the optimized parameters (e.g. hidden layer nodes, input and output embedding dimensions) across data-sets and domains.
- In Moses, NPLM feature will slow down the decoder speed.
- It actually improves the translation performance when being used with n-gram LM features. But I am not sure whether it can completely replace n-gram LM features.

2) OxLM: A Neural Language Modelling Framework for Machine Translation (Paul Baltescu et al., The Prague Bulletin of Mathematical Linguistics 2014)

Note:
- Resulting toolkit: OxLM (https://github.com/pauldb89/oxlm)
- Moses already has this feature.

3) rwthlm - A toolkit for training neural network language models (feed-forward, recurrent, and long short-term memory neural networks). The software was written by Martin Sundermeyer.

4) (to be updated)

*** For Translation Model:

Note
- ACL 2014 best paper award. 
- Accoding to the paper, they obtained a very impressive performance for Arabic-English Translation; good performance for Chinese-English Translation (datasets: OpenMT 2012, BOLT; domains: news, web forums).
- Moses already has this feature. Basic implementation of this model is already included in Moses under the name "BilingualLM".
- NPLM can be used to train the models for this.

Comments
- Personally, I tried this model with Moses and evaluated with conversational domains (e.g. SMS, Chat, conversational telephone speech) using OpenMT'15 datasets. I obtained good (but not very impressive, 0.7-1.0 BLEU score) performance compared to basic baseline. Using this model together with other strong features did not give significantly better performance as said in the paper :(.
- Optimizing parameters for this model is an exhausted task.


3) (to be updated)

*** For Reordering Model:
1) Advancements in Reordering Models for Statistical Machine Translation (Minwei Feng et al., ACL 2013)

2) A Neural Reordering Model for Phrase-based Translation (Peng Li et al., COLING 2014)

3) (to be updated)

Monday, 8 December 2014

Competitive Programming Book

Linkhttps://sites.google.com/site/stevenhalim/
Intro: for programming contests

VisuAlgo

Intro:  a tool to help his students better understand data structures and algorithms, by allowing them to learn the basics on their own and at their own pace.