   Sources for varieties of English

Text corpora
Dedicated journals

Historical records

Information about varieties in previous centuries can be gleaned from a number of sources. These can be classified by type. Each type has its own advantages and disadvantages. The more types of historical record available, the better. Frequently, one has to make do with a limited set of sources and reconstruct features of varieties on the basis of fragmentary material.

Emigrant letters People who emigrated in previous centuries wrote back home, usually to maintain contact with friends and relatives. Because of this, letters from emigrants are available in archives today. Such material is usually non-prescriptive, i.e. written in a colloquial style without undue consideration of normative grammar. Hence it is a good source of information on varieties and, when used judiciously, can be useful for linguistic analyses.
Personal accounts Apart from letters, there are also documents of various kinds in which speakers offer personal accounts of their lives and experiences. Some of these have been recorded deliberately, e.g. the accounts of life under slavery or in other adverse conditions. Such texts are not normally written using the variety in question, unless verbatim transcripts of what was said by informants are used. In this context one could also mention court records in which the statements of accused persons and/or defendants were written down by court clarks.
Dialect glossaries From the 17th century onwards, a certain antiquarian interest in dialect vocabulary can be observed. Collections of words from diverse regions of the British Isles are available and are often a good source of material on the varieties spoken there. Such material is almost entirely lexical, i.e. information about pronunciation and grammar is not normally included.
Literary satires Already with Chaucer (in the 14th century) one finds dialect material used to characterise figures in literary works. Shakespeare and Ben Jonson are prominent Elizabethan writers who kept up this tradition. Many satires contain figures from the Celtic regions, i.e. Irish, Scottish or Welsh characters, especially in drama from the 17th century onwards. The accuracy of such portrayals is often doubtful because many of the authors were English and did not have a first-hand knowledge of the speech they were satirising. In addition, there are limits on the linguistic features which can be represented using so-called ‘eye dialect’, i.e. changes in spelling to indicate dialect traits in writing.
Rhyming material End rhyme, in poetry and sometimes in drama, can be a source of information on the pronunciation of vowels. For instance, one could check whether eat and great or past and waste rhyme for a particular author. This could indicate whether the first words in each pair still had the vowel /e:/ or /a:/ respectively.
Prescriptive comments From the 18th century onwards, there are many works in which authors complain about regional pronunciation and grammar. This is connected with the rise of prescriptivism, i.e. strict notions of what is ‘correct’ in language and what variety was taken to be socially acceptable, and by implication what other forms were not. Authors often cite supposedly ‘incorrect’ usage and thus inadvertently supply present-day linguists with information about regional varieties of English in previous centuries.

Selected references for different types of historical records

Emigrant letters

Personal accounts

Dialect glossaries

Literary satires

Rhyming material

Prescriptive comments

Available corpora

Since the early 1990s a large number of corpora have become available. Some of these corpora are specific to certain varieties of English. Below, a selection of such sources is given. These corpora are in the main concerned with documenting the standard variety of the country where they are compiled. This is particularly true of the ICE corpora (compiled as part of the large and ongoing project, International Corpus of English, coordinated by the Departmen of English, University College London). The sub-copora of this project are labelled by using the acronym and then the region/country in question, e.g. ICE-East Africa or ICE-Ireland. A full list of the currently available corpora can be found on the main website for the entire project (see relevant entry in the following table).

Most of the universities involved in the compilation of such corpora have websites with additional information. The field of variety corpora is an expanding field and with each passing year new corpora appear, some of which are put in the public domain by their compilers. As can be seen from the following list, many corpora are in fact dedicated to forms of English in the early modern period (from the 17th century to the present day). This time span is important as it is covers the period during which English was transported overseas.

Name Compiling institution / individuals
ARCHER, a corpus of British and American English from 1650-1990 Douglas Biber and associates in Northwestern Arizona University in collboration with colleagues at the University of Freiburg, Germany
Australian Corpus of English Department of Linguistics, Macquarie University, NSW, Australia
Bank of English University of Bermingham, sponsored by the publisher HarperCollins
British National Corpus Consortium under the aegis of Oxford University Press
Brown Corpus of Standard American English. W. Nelson Francis and Henry Kucera, Brown University, Providence, Rhode Island
Corpus of 19th Century English Merja Kytö and associates, Uppsala University, Sweden
Corpus of English Dialogues Merja Kytö, Uppsala University, Sweden and Jonathan Culpeper, Lancaster University, England
Corpus of Early English Correspondence Terttu Nevalainen and Helena Raumolin-Brunberg, University of Helsinki, Finland
A Corpus of Irish English Raymond Hickey, Essen University, Germany (packaged with Corpus Presenter, Software for Language Analysis, Amsterdam: John Benjamins, 2003)
Corpus of London Teenage Language (COLT) Anna-Britta Stenström and associates, Department of English, University of Bergen
Freiburg-Brown Corpus of American English (FROWN) Christian Mair and associates, University of Freiburg, Germany
Freiburg-LOB Corpus of British English (FLOB) Christian Mair and associates, University of Freiburg, Germany
Freiburg Corpus of English Dialects (FRED) Bernd Kortmann and associates, University of Freiburg, Germany
The Helsinki Corpus of Older Scots Anneli Meurman-Solin, Department of English, University of Helsinki, Finland
International Corpus of English (ICE), collection of corpora from various anglophone countries, now (2005) partially completed Co-ordinated by the Department of English, University College London, England
Kolhapur Corpus of Indian English Shivaji University, Kolhapur
The Newcastle Electronic Corpus of Tyneside English (NECTE) Karen Corrigan, School of English Literature, Language, and Linguistics, University of Newcastle upon Tyne
Northern Ireland Transcribed Corpus of Speech (NITCS) John Kirk, Department of English, Queen’s University, Belfast, Northern Ireland
Old Bailey Court Depositions Department of History, University of Sheffield
Santa Barbara Corpus of Spoken American English University of Santa Barbara, California

Dedicated journals

American Speech
English Language and Linguistics
English Today
English World-Wide
Language and Society
Journal of English Linguistics
Journal of Pidgin and Creole Languages
Journal of Sociolinguistics
Language Variation and Change


For a comprehensive list of relevant books, see the branch References towards the bottom of the tree on left.