Thursday 5 February 2015

Available Data from the CommonCrawl

Linkhttp://statmt.org/ngrams/
Intro: Multi-language data used for training large-scale LMs crawled from CommonCrawl.