Wednesday, April 3, 2013

Finding language in context - Google site search

Let's say you've found a new word. You've looked it up in a dictionary and found an example sentence or two. Perhaps you've also found what other words it collocates with. (I'll be looking at some online tools to find collocates, in another post). What you'd like to do now is to see how it's used in context.
One possibilty is to use a corpus, ("a collection of samples of real-world texts stored on computer. Plural - corpora" - Leoxicon), but these can sometimes be difficult to use, and when they include spoken language, the grammar is occasionally "non-standard", let's say. The British National Corpus is easy to use, but be careful with examples from spoken language.
Another way is to do a simple Google (or other) search. The Internet is one enormous corpus if you think about it, although no linguist has "collected" these examples. But a simple search can bring up a lot of irrelevant material, and again you're not really assured of grammatical correctness.
What I like to do is a Google site search of trusted newspapers and other websites, which are in effect small corpora, or look in Google Books, where the material has been edited and proofread, so is likely to be grammatically correct.

With a Google site search, you put in your search term, (which I like to put in inverted commas so that it only looks for these words when they are together), followed by site: and the address of the website (without http://). So if I wanted to find examples of highly unlikely on the Guardian website, I'd enter:
  • "highly unlikely"
To make things easier, I've put together a simple tool to look up words and expressions on various newspaper sites, etc. Just enter a word or expression into the Entry Box and click on one of the links. (Try it with the examples). I'll probably add some more sites to it later.
A note on books - clicking on Google Books searches all books digitised by Google. There is also a facility for doing an advanced search of Google Books here. You can for example choose to search only modern books. Project Gutenberg is a digital collection of out of copyright books, so it has all the classics but few modern books.
British quality pressBritish tabloidsAmerican press
The Guardian The Daily Mail New York Times
The Independent The Daily Express Washington Post
The Telegraph The Mirror Herald-Tribune
The Times The Sun Chicago Tribune
The Financial Times San Francisco Chronicle
The Economist Los Angeles Times
Miami Herald
Wall Street Journal
Huffington Post
Time Magazine
Enter expression:
National Geographic Geographical (UK) Discover Wildlife (BBC)
History Today History Extra (BBC) History Channel
New Scientist Scientific American Science Focus (BBC)
Project Gutenberg Google Books


  • Leoxicon - a blog dedicated to the use of corpora in the teaching of English to foreign students.


