Since the early 1990’s a large number of corpora have become available which consist of texts covering periods in the history of English. The first major corpus in this area was the Helsinki Corpus of English Texts (1991, 1993) which includes extracts from various works ranging from Old English to the late modern period. At the University of Helsinki various additional corpora have been compiled since, focussing on a selection of texts, either of a particular genre, e.g. personal correspondence, medical texts, or from a particular region, e.g. Scottish texts. Other universities soon followed suit and by the end of the decade quite an impressive range of corpora was available.

Below a selection of corpora are listed to convey an impression of the variety and coverage of those currently available. This is an expanding field and with each passing year new corpora appear, some of which are put in the public domain by their compilers.

Name Compiling institution / individuals
ARCHER, a corpus of British and American English from 1650-1990 Douglas Biber and associates in Northwestern Arizona University in collboration with colleagues at the University of Freiburg, Germany
Australian Corpus of English Department of Linguistics, Macquarie University, NSW, Australia
Bank of English University of Bermingham, sponsored by the publisher HarperCollins
British National Corpus Consortium under the aegis of Oxford University Press
The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English A parsed section of the original Helsinki corpus prepared by a number of linguists
Brown Corpus of Standard American English. W. Nelson Francis and Henry Kucera, Brown University, Providence, Rhode Island
Corpus of 19th Century English Merja Kytö and associates, Uppsala University, Sweden
Corpus of Dialogues Merja Kytö, Uppsala University, Sweden and Jonathan Culpeper, Lancaster University, England
Corpus of Early English Correspondence Terttu Nevalainen and Helena Raumolin-Brunberg, University of Helsinki, Finland
A Corpus of Irish English Raymond Hickey, Essen University, Germany (packaged with Corpus Presenter, Software for Language Analysis, Amsterdam: John Benjamins, 2003)
Corpus of London Teenage Language (COLT) Anna-Britta Stenström and associates, Department of English, University of Bergen
Corpus of Middle English Prose and Verse University of Michigan, Michigan
Freiburg-Brown Corpus of American English (FROWN) Christian Mair and associates, University of Freiburg, Germany
Freiburg-LOB Corpus of British English (FLOB) Christian Mair and associates, University of Freiburg, Germany
The Helsinki Corpus of Older Scots Anneli Meurman-Solin, Department of English, University of Helsinki, Finland
Innsbruck Corpus Archive of Middle English Texts (ICAMET) Manfred Markus, University of Innsbruck, Austria
International Corpus of English (ICE), collection of corpora from various anglophone countries, now (2005) partially completed Co-ordinated by the Department of English, University College London, England
Kolhapur Corpus of Indian English Shivaji University, Kolhapur
Lampeter Corpus of Early Modern English Tracts Josef Schmied, Technical University Chemnitz, Germany
Lancaster-Oslo-Bergen Corpus of British English Collaborative effort of the universities in the three cities named in title
London-Lund Corpus of Spoken English Departments of English at University College London, England and Lund University, Sweden
Middle English Medical Texts Irma Taavitsainen, Päivi Pahta and Martti Mäkinen, Department of English, University of Helsinki, Finland. Retrieval software by Raymond Hickey. Published by John Benjamins, 2005.
Northern Ireland Transcribed Corpus of Speech (NITCS) John Kirk, Department of English, Queen’s University, Belfast, Northern Ireland
Penn-Helsinki Parsed Corpus of Middle English University of Pennsylvania, Pittsburgh, Pennsylvania
Old Bailey Court Depositions Department of History, University of Sheffield
Santa Barbara Corpus of Spoken American English University of Santa Barbara, California
Zurich English Newspaper Corpus Udo Fries and associates, Department of English, Zurich University

Download software

If you want to see how corpus software works, you can download a free version of my software package, Corpus Presenter. This was published as a book and CD entitled Corpus Presenter, Software for Language Analysis (Amsterdam: John Benjamins) in 2003. The version with the book is 7.0; version 12.0 (Build February 2012) is available from a dedicated website which I maintain for the program. For more details, go to the Corpus Presenter website.

The version which you can download via the link below contains all the functions of the full program except the third, and most sophisticated level of text retrieval. Already with the lite version, you can, however, carry out refined searches across sets of texts and use wild-cards, sets of input forms, etc. All returns which might be made can be stored to disk or copied to the Windows clipboard.

To allow you to get moving quickly, I have enclosed a small test corpus – called “SmallSampleCorpus.cpd” – which contains extracts from Beowulf, Chaucer's Canterbury Tales, some items by Shakespeare (two plays and the sonnets) as well as a number of Irish pieces. You can start doing searches with this corpus straight away. If you have your own files (in plain text, RTF, HTML or XML format) you can search through these equally well. Just select your files from the initial file listing and load them directly.

   Corpus Presenter

The file you download is a ZIP file. You must unzip this file (to any directory you like) and then start the program SETUP.EXE which will be in the list of files extracted from the ZIP file. The setup procedure will suggest installing Corpus Presenter Lite to the directory C:\Program Files\Corpus Presenter Lite which you should let it do. Once the installation is complete (and you have re-started your computer) there will be an entry Corpus Presenter in the list under Start - Programs on the desktop of your computer.

Corpus Presenter works best with Windows XP (Service Pack 3), Windows Vista and Windows 7 (and presumably later versions when these appear). You are not advised to use versions of the operating system older than Windows XP. For legal reasons, I must stress that you use it at your own risk. The program can be removed easily from your computer via the Add/Remove Software module in the Control Panel of Windows.

Dedicated journals

ICAME Journal, 1996- University of Bergen, Norway.
International Journal of Corpus Linguistics, 1996- Amsterdam: John Benjamins.
Corpus Linguistics and Linguistic Theory, 2005- Berlin: Mouton de Gruyter.


