Saturday, 18 June 2011

Detection of Errors and Correction in Corpus Annotation

Intro: The success of data-driven approaches and stochastic modeling in computational linguistic research and applications is rooted in the availability of electronic natural language corpora. Despite the central role that annotated corpora play for computational linguistic research and applications, the question of how errors in the annotation of corpora can be detected and corrected has received only little attention. The DECCA project is designed to address this important gap by exploring an error detection and correction method with potential applicability to a wide range of corpus annotations.