Goto desktop  Move back one step  Move forward one step  Sitemap
Larger font Smaller font

  Getting started with Corpus Presenter


  What you can do with Corpus Presenter ?
  Making a data set file
  Designing a data set file with CP Make Tree
  Loading an existing corpus
  Loading your own files
  Converting files

When you start Corpus Presenter you are immediately shown the above window in which you can choose to do one of two things:


  1)   Load a data set file to work with a corpus you already have access to  

  2)   Load files directly for processing with Corpus Presenter

If you decide to load a data set file then you can (1) simply continue processing the corpus you were using last time (note that the program checks to confirm that the necessary data set file is on disk), (2) select a data set (a .CPD file) from a directory listing, (3) load the supplied A Corpus of Irish English or (4) load the supplied test corpus.

If you do not want to do any of the above suggestions, then choose the option, Load text file(s) directly. The screen changes and you are presented with the following display.

There are a number of options on the directory listing level and what steps you take now depend on what you want to do with Corpus Presenter. The following flowchart is an attempt to show what options are open to you. If you want to load files of your own, then following the steps on the right-hand side of the display below.

Making a data set file


If you select the option indicated in the right-hand side of the above flowchart by clicking on the button Make data set, then you will see the following screen. There are two options here: (i) make a data set from a group of selected files in the current directory, (ii) make a data set from a branch of the disk which is shown as a tree on the left of the screen on the directory listing level. In the latter case the branches of the tree which is encoded in the data set and shown by Corpus Presenter will correspond to those in the disk tree. The files in the folders of the disk tree will form the nodes of the branches in the tree displayed by Corpus Presenter on the main level. Bear in mind that with such a tree you can selectively search through branches of the data set tree which effectively means through branches of the disk tree of your computer.


Copy selected files from a loaded corpus


You can also make a data set by copying files from a corpus you have loaded. This is done when in the Checked files display mode on the desktop level. For more information how to make a data set by these means, consult the module Making a data set by copying files from a corpus.


Designing a data set file with Corpus Presenter Make Tree


You can design/edit data set files interactively with the utility Corpus Presenter Make Tree (new in Version 10.0). You simply load a supplied data set file and alter it to suit your needs. You can also start a data set from scratch if you wish. To see what editing data set files is like, I suggest you load any of the three data set files supplied with A Corpus of Irish English and experiment with them within Corpus Presenter Make Tree.

Click here for more information about Corpus Presenter Make Tree


Loading an existing corpus


The first time you use Corpus Presenter you should try out one of the test corpora which have been supplied. On the CD accompanying the book there are two: A Corpus of Irish English and another smaller one, simply called Test Corpus. This contains a variety of data types: texts, databases, images and sound files. Select the data set file TEST_CP.CPD from one directory listing level (if the initial help text is displayed when you first load Corpus Presenter then you can click on the button Load supplied corpus).

The data set file TEST_CP.CPD contains references to all the supplied data files which are then displayed in a structured tree form. By clicking on a node of the tree you can view the file which is associated with this node. The same holds for SmallSampleCorpus.cpd and A Corpus of Irish English.

Loading your own files


When the dialogue window Open a data set appears at the beginning or when you choose to work with a new corpus from within Corpus Presenter (press Ctrl-O for this) you click on the button Load text file(s) directly. Then all that is required is that you select the files you wish to use and click on the Load button in the bottom left-hand corner of the screen or just press Shift-F12. The files are then displayed as a 'flat' tree with just one level on which the names of each file are to be found. You can also press Ctrl-D from the main level of Corpus Presenter to directly load files. See the flowchart above for information on file types.


Converting files from one type to another


As you can seen from the following screen shot, the option Convert files on the directory listing level allows six types of file conversion. Bear in mind that if your primary aim is to comb through files for text strings, then it might be sensible to convert these into plain ASCII files for this task. The retrieval is fastest with this type; you can format returns later if you decide to import them into your word processor.


Because many recent corpora have been using the XML protocol for text formatting, I have added a short explanatory text:

What is XML ?.