Goto desktop  Move back one step  Move forward one step  Sitemap
Larger font Smaller font
   Download software

Corpus Presenter, Version 14   (c. 40 MB Zip file). Build: December 2014.

Files for the Helsinki corpus

Version 14 Corpus Presenter is a major upgrade of the program compared to previous versions and contains many significant enhancements. It is entirely free of charge and does not require registration or a password.

Installation process greatly simplified and improved

All you need do is download the ZIP file from the link in the above box and then extract the single .EXE file called Corpus_Presenter_14_Setup.exe. Double click on this file to open it. It will install Corpus Presenter 14 in less than a minute on any Windows-based computer system.

Apart from Corpus Presenter I program adaptations of this corpus software for university colleagues who have prepared or are preparing a corpus of their own and wish to have sophisticated retrieval software to go with this. So far such adaptations have been published by the Dutch linguistic publishers John Benjamins who also published the original of Corpus Presenter (version 7) in 2003. Anyone colleagues interested in an adaptation of Corpus Presenter to go with their corpus and which they would like to publish are advised to contact me at

If you have installed a previous version of Corpus Presenter on your computer, please remove it first (via the Add/Remove Software module in the Control Panel). Delete the directory C:\Program Files\Corpus Presenter as well.

Corpus Presenter works best with Windows XP / Windows Vista / Windows 7 / Windows 8 or 8.1. Make sure that you have installed the latest Service Pack for Windows XP, currently Service Pack 3, June 2009, available from Microsoft at the following address: (Windows XP, Service Pack 3)

You are not advised to use versions of the operating system older than Windows XP (i.e. not Windows 2000 and certainly not Windows 98). For legal reasons, I must stress that you use Corpus Presenter at your own risk. The program can be removed easily from your computer via the Add/Remove Software module in the Control Panel of Windows XP (called the Programs and Features module in versions of and after Windows 7).

The current version of Corpus Presenter is Version 14, build: December 2014 (see download link above). It follows on Version 13 (July 2013), Version 12 (September 2012), Version 11 (February 2009) and previous versions, including Version 7 supplied with the book. You do not need previous versions to run Version 14. You can update from the CD with the book (Version 7) directly to Version 14. Please uninstall all previous versions and re-start your computer before installing Version 14.

The book Corpus Presenter can be purchased from John Benjamins (see the relevant section of their website). If either you or your library purchase the book then this entitles you to support from the author.

   Helsinki Corpus Files (size: 91 KB)

The Helsinki Corpus of English Texts consists of 242 text files which are located in a single directory. I have constructed a data set file – Helsinki_Corpus.cpd – which will display the Helsinki Corpus as a hierarchical tree divided into layers according to period (Old, Middle and Early Modern English) and sub-period and then by genre as can be seen in the following screen shot.

The second file is intended for use with the supplied utility Corpus Presenter Find Text. The file is Helsinki_Codes.lst and it will replace the sequences of "+" and a letter with the actual Old and Middle English symbols, ash, thorn and eth in all the texts of the corpus which contain these. This makes the Old and Middle English texts much more readable. Bear in mind that the symbols, ash, thorn and eth can be accessed in Corpus Presenter modules by clicking on the button OE/ME, e.g. in the search options window of the Quick search or the parameters window on the Advanced search level.

To carry out the replacements, do the following. Unzip the download file from the above link to the directory in which you keep the files of the Helsinki Corpus. Start Corpus Presenter Find Text and enter this directory. Choose Helsinki_Codes.lst as the file with input form for the Find / Replace operation. Select all the forms and click on the Proceed button. When the files have been processed, all replacements will have been made, some 202,550 in all. The procedure should take some minutes, that is normal.

The problem of yogh

In the ZIP file there is another file for doing replacements in Helsinki Corpus texts, namely Helsinki_Codes_with_Yogh.lst. The following additional lines can be found in this file:

+g   3   
+G   3   

These replace all instances of +g and +G, the representation of yogh in the Helsinki Corpus texts, with the number 3 (there are no separate uppercase and lowercase forms for Arabic numerals, hence the same replacement in both cases). The only problem here is that earlier English yogh is not really a 3 (the number ‘three’). If you do carry out this replacement in the Helsinki Corpus texts, then you will have to remember to enter 3 every time you search for a string in Corpus Presenter which has yogh (= 3) in it. You can do that, it’s messy I admit, but it is a solution because 3 instead of +g is definitely makes texts more readable.

There are two further data set files in the ZIP file (1) CEECS.cpd which is designed to work with the Corpus of Early English Correspondence Sampler by Terttu Nevalainen and Helena Raumolin-Brunberg, (2) Old_Scots.cpd which can be used with the Helsinki Corpus of Older Scots by Anneli Meurman-Solin.