Thursday, 12 May 2011

Google Books Corpus

http://googlebooks.byu.edu/

"This corpus is based on the American English portion of the Google Books data (see http://ngrams.googlelabs.com and especially http://ngrams.googlelabs.com/datasets). It contains 155 *billion* words (155,000,000,000) in more than 1.3 million books from the 1810s-2000s (including 62 billion words from just 1980-2009).

The corpus has most of the functionality of the other corpora from http://corpus.byu.edu (e.g. COCA, COHA, and our interface to the BNC), including: searching by part of speech, wildcards, and lemma (and thus advanced syntactic searches), synonyms, collocate searches, frequency by decade (tables listing each individual string, or charts for total frequency), comparisons of two historical periods (e.g. collocates of "women" or "music" in the 1800s and the 1900s), and more." (From Corpora-List)