Monday, 1 December 2014

The ClueWeb09 Dataset

Intro: The ClueWeb09 dataset was created to support research on information retrieval and related human language technologies. It consists of about 1 billion web pages in ten languages that were collected in January and February 2009. The dataset is used by several tracks of the TREC conference.
Note: Huge corpus for LM

No comments:

Post a Comment