Corpus Presenter, Version 12
Help file as separate download
Files for the Helsinki corpus
Corpus Presenter, Version 13 (c. 85 MB Zip file)
(Build: March 2013)
Version 12 of Corpus Presenter is a major upgrade of the program compared to previous versions and contains many significant enhancements. It is entirely free of charge and does not require registration or a password.
Apart from Corpus Presenter I program adaptations of this corpus software for university colleagues who have prepared or are preparing a corpus of their own and wish to have sophisticated retrieval software to go with this. So far such adaptations have been published by the Dutch linguistic publishers John Benjamins who also published the original of Corpus Presenter (version 7) in 2003. Anyone colleagues interested in an adaptation of Corpus Presenter to go with their corpus and which they would like to publish are advised to contact me at email@example.com.
If you have installed a previous version of Corpus Presenter on your computer, please remove it first (via the Add/Remove Software module in the Control Panel). Delete the directory C:\Program Files\Corpus Presenter as well.
The file you download is a ZIP file. You must unzip this file (to any directory you like). Make sure that when you unzip the files you tick the box ‘Use Folder Names’ in the window which appears on activating the command Extract. After unzipping the files, you then start the program setup.exe which will be in the list of files extracted from the ZIP file (in the directory \Corpus_Presenter). The setup procedure will suggest installing Corpus Presenter to the directory C:\Program Files\Corpus Presenter which you should let it do. Once the installation is complete (and you have re-started your computer) there will be a folder called Corpus Presenter on the desktop of your computer and an entry Corpus Presenter in the list under Programs which you activate from the Windows symbol in the bottom left-hand corner of the screen.
Corpus Presenter works best with Windows XP / Windows Vista / Windows 7. Make sure that you have installed the latest Service Pack for Windows XP, currently Service Pack 3, June 2009, available from Microsoft at the following address:
http://www.microsoft.com/downloads (Windows XP, Service Pack 3)
You are not advised to use versions of the operating system older than Windows XP (i.e. not Windows 2000 and certainly not Windows 98). For legal reasons, I must stress that you use Corpus Presenter at your own risk. The program can be removed easily from your computer via the Add/Remove Software module in the Control Panel of Windows XP / Windows Vista / Windows 7.
The current version of Corpus Presenter is Version 13, build: March 2013 (see download link above). It follows on Version 12 (September 2012), Version 11 (February 2009) and previous versions, including Version 7 supplied with the book. You do not need previous versions to run Version 12. You can update from the CD with the book (Version 7) directly to Version 12.
The book Corpus Presenter can be purchased from John Benjamins (see the relevant section of their website). If either you or your library purchase the book then this entitles you to support from the author.
You can download just the help file for Corpus Presenter if you like (note that this is already contained in the larger download of the whole package above).
|Corpus Presenter, Version 12 (help file, c. 10 MB Zip file)|
Helsinki Corpus Files (size: 91 KB)
The Helsinki Corpus of English Texts consists of 242 text files which are located in a single directory. I have constructed a data set file – Helsinki_Corpus.cpd – which will display the Helsinki Corpus as a hierarchical tree divided into layers according to period (Old, Middle and Early Modern English) and sub-period and then by genre as can be seen in the following screen shot.
The second file is intended for use with the supplied utility Corpus Presenter Find Text. The file is Helsinki_Codes.lst and it will replace the sequences of "+" and a letter with the actual Old and Middle English symbols, ash, thorn and eth in all the texts of the corpus which contain these. This makes the Old and Middle English texts much more readable. Bear in mind that the symbols, ash, thorn and eth can be accessed in Corpus Presenter modules by clicking on the button OE/ME, e.g. in the search options window of the Basic search or the parameters window on the Advanced search level.
To carry out the replacements, do the following. Unzip the download file Helsinki.zip from the above link to the directory in which you keep the files of the Helsinki Corpus. Start Corpus Presenter Find Text and enter this directory. Choose Helsinki_Codes.lst as the file with input form for the Find / Replace operation. Select all the forms and click on the Proceed button. When the files have been processed, all replacements will have been made, some 202,550 in all. The procedure should take some minutes, that is normal.
The problem of yogh
In the ZIP file Helsinki.zip there is another file for doing replacements in Helsinki Corpus texts, namely Helsinki_Codes_with_Yogh.lst. The following additional lines can be found in this file:
These replace all instances of +g and +G, the representation of yogh in the Helsinki Corpus texts, with the number 3 (there are no separate uppercase and lowercase forms for Arabic numerals, hence the same replacement in both cases). The only problem here is that earlier English yogh is not really a 3 (the number ‘three’). If you do carry out this replacement in the Helsinki Corpus texts, then you will have to remember to enter 3 every time you search for a string in Corpus Presenter which has yogh (= 3) in it. You can do that, it’s messy I admit, but it is a solution because 3 instead of +g is definitely makes texts more readable.
There are two further data set files in the ZIP file Helsinki.zip: (1) CEECS.cpd which is designed to work with the Corpus of Early English Correspondence Sampler by Terttu Nevalainen and Helena Raumolin-Brunberg, (2) Old_Scots.cpd which can be used with the Helsinki Corpus of Older Scots by Anneli Meurman-Solin.