Wednesday 29 August 2012

Fangorn: a system for querying very large treebanks

Linkhttp://nltk.ldc.upenn.edu:9090/index
Intro: Fangorn is an open source tool for querying very large treebanks, built on top of Apache Lucene.  Fangorn implements the LPath linguistic path language, which has an XPath-like syntax along with linguistically motivated extensions.  Result trees are annotated with the query in order to show how the query matched the tree, and these annotations can themselves be modified and submitted as further queries.

Tuesday 28 August 2012

Intel® Threading Building Blocks (Intel® TBB)

Intro: Intel® Threading Building Blocks (Intel® TBB) offers a rich and complete approach to expressing parallelism in a C++ program. It is a library that helps you take advantage of multi-core processor performance without having to be a threading expert. Intel TBB is not just a threads-replacement library. It represents a higher-level, task-based parallelism that abstracts platform details and threading mechanisms for scalability and performance. 

Monday 27 August 2012

ACCURAT Toolkit

Intro: The ACCURAT project (http://www.accurat-project.eu/) is pleased to announce the release of ACCURAT Toolkit - a collection of tools for comparable corpora collection and multi-level alignment and information extraction from comparable corpora. By using the ACCURAT Toolkit, users may obtain:
- Comparable corpora from the Web (current news corpora, filtered Wikipedia corpora, and narrow domain focussed corpora);
- Comparable document alignments;
- Semi-parallel sentence/phrase mapping from comparable corpora (for SMT training purposes or other tasks);
- Translated terminology extracted and mapped from bilingual comparable corpora;
- Translated named entities extracted and mapped from bilingual comparable corpora.