Here is a not-complete list of word aligners used for Machine Translation:
1) Unsupervised Aligners
- GIZA++
- fast_align (with cdec)
- pialign
- BerkeleyAligner
2) Supervised Aligners
- BerkeleyAligner
- NILE
(to be updated ...)
Friday, 9 January 2015
Tuesday, 6 January 2015
Multi-Task Learning toolkit
1) MALSAR
Link: http://www.public.asu.edu/~jye02/Software/MALSAR/
Link: http://www.public.asu.edu/~jye02/Software/MALSAR/
Intro: the MALSAR (Multi-tAsk Learning via StructurAl Regularization) package includes the following multi-task learning algorithms:
- Mean-Regularized Multi-Task Learning
- Multi-Task Learning with Joint Feature Selection
- Robust Multi-Task Feature Learning
- Trace-Norm Regularized Multi-Task Learning
- Alternating Structural Optimization
- Incoherent Low-Rank and Sparse Learning
- Robust Low-Rank Multi-Task Learning
- Clustered Multi-Task Learning
- Multi-Task Learning with Graph Structures
- Disease Progression Models
- Incomplete Multi-Source Fusion (iMSF)
- Multi-Stage Multi-Source Fusion
- Multi-Task Clustering
2)
Link: http://klcl.pku.edu.cn/member/sunxu/software/MultiTask.zip
Intro: This is a general purpose software for online multi-task learning. The online multi-task learning is mainly based on Conditional Random Fields (CRF) model and Stochastic Gradient Descent (SGD) training.
I am going to deepen this technique for machine translation and domain adaptation.
Link: http://klcl.pku.edu.cn/member/sunxu/software/MultiTask.zip
Intro: This is a general purpose software for online multi-task learning. The online multi-task learning is mainly based on Conditional Random Fields (CRF) model and Stochastic Gradient Descent (SGD) training.
I am going to deepen this technique for machine translation and domain adaptation.
Labels:
machine learning,
MTL,
multi-task learning,
NLP,
research,
toolkit
Monday, 5 January 2015
MTTK - Machine Translation Toolkit
Intro: MTTK is a collection of software tools for the alignment of parallel text for use in Statistical Machine Translation. With MTTK you can ...
- Align document translation pairs at the sentence or sub-sentence level, sometimes known as chunking. This is a useful pre-processing step to prepare collections of translations for use in estimating the parameters of complex alignment models. Sub-sentence alignment in particular makes it possible to segment long sentences into shorter aligned segments that otherwise would have to be discarded.
- Train statistical models for parallel text alignment. The following models are supported :
- IBM Model-1 and Model-2
- Word-to-Word HMMs
- Word-to-Phrase HMMs , with bigram translation probabilities
- Parallelize your model training procedures. If you have multiple CPUs available, you can partition your translation training texts into subsets, thus speeding up iterative parameter re-estimation procedures and reducing the amount of memory needed in training. This is done under exact EM-based parameter estimation procedures.
- Generate word-to-word and word-to-phrase alignments of parallel text. MTTK can generate Viterbi alignments of parallel text (both training text and other texts) under the supported alignment models.
- Extract word-to-word translation tables from aligned bitext and from the estimated models.
- Extract phrase-to-phrase translation tables (phrase-pair inventories) from aligned parallel text.
- Use the HMM alignment models to induce phrase translations under its statistical models. Phrase-pair induction can generate richer inventories of phrase translations than can be extracted from Viterbi alignments.
- Edit the C++ source code to implement your own estimation and alignment procedures.
Labels:
alignment,
IBM models,
MT,
NLP,
parallel text,
research,
SMT,
toolkit
Subscribe to:
Posts (Atom)