HOANG Cong Duy Vu's research logs: The ClueWeb09 Dataset

Monday, 1 December 2014

The ClueWeb09 Dataset

Link: http://www.lemurproject.org/clueweb09/

Intro: The ClueWeb09 dataset was created to support research on information retrieval and related human language technologies. It consists of about 1 billion web pages in ten languages that were collected in January and February 2009. The dataset is used by several tracks of the TREC conference.

Note: Huge corpus for LM

HOANG Cong Duy Vu's research logs

Monday, 1 December 2014

The ClueWeb09 Dataset

No comments:

Post a Comment

Pages

My Blog List