[Standard query | Lemma query | Browse a file | Word lookup | Scan keywords/titles | Explore genre labels | Frequency lists | User settings | Query history | Saved queries | Create/Edit subcorpora | Post-query options ]

Scan Keywords/Titles

The Scan Keywords/Titles feature can be used to retrieve a list of BNC text files on the basis of the classification contained in the "title" and "keyword" element of the file headers. As with the Explore genre labels feature, the list of texts can be used to create a subcorpus:

There are 3 basic steps to this function:

  1. Specify keywords/title words to scan
  2. Choose the files you require
  3. Choose whether to make a new subcorpus, or add to a pre-existing one

Each of these steps is explained in more detail below.


Step 1: Type in a search word or phrase and choose to search either via Keywords (using COPAC or BNC1 descriptive keywords) or via the Titles of the files.


Keyword Search

Use this if you want to search for a list of written BNC files which match a particular keyword, or set of keywords, that you have in mind. For example, if you want to create a subcorpus of published texts which are to do with the general subject of India, simply type India as your keyword and specify the COPAC1 library catalogue.

Searching by COPAC keywords is the default, and restricts you to published, written texts.2 If you click on the drop-down menu, you will see that there is another type of keywords, labelled "descriptive keywords -  BNC1 release". This refers to the original set of keywords entered by the compilers of version 1 of the BNC. In general, these "BNC1 keywords" are less systematic and useful than COPAC keywords because they do not meet professional library cataloguing standards.

Title Search

Each BNC file (both spoken and written) has a 'title' associated with it (embedded in the file header). This title is sometimes useful for identifying what the text is about (for published, written texts, this title is often part of the title of the book/article/text;  for spoken texts, the title tells you about the context of the spoken recording and the number of participants). For example, the title for file A07 includes the words "The tragedy of belief." If you are looking for BNC texts on belief systems (religion), you may search for all files which include the word 'belief' in their title by choosing this option. However, in general, it is far more efficient to search for files on the basis of Keywords, as explained above. The keywords for A07, for example, are more informative than its actual title, as the following excerpt from the COPAC keywords entry for the file shows:  "Ireland. Catholic Church. Relations with state; Ireland - Church history - 20th century ; Church and state - Northern Ireland ; Irish question".

The option for "Match" refers to the word(s) that you type into the box for Keyword(s) or the box for Title words(s). The default setting ("all words") means all of the words you type must appear in the keywords or title entry in order for a match to be made. Choosing "any word" means if any of the words you type matches the entry for a BNC file, that file will be included. For example if you search for the keywords "health service" using "Match any word", you will retrieve files to do with health and the National Health Service, but also any files to do with any kinds of service (e.g. 'military service', 'railway services', and so on).

Step 2: Choose the files you require. The screenshot below shows the results of a COPAC keywords search for "India", and the further options available.



You can select individual files to include by clicking the relevant box on the right, or simply choose "include all files" at the bottom. 

Step 3: You can now either "Add"  the chosen files to a "New subcorpus" (the default) or add them to a pre-existing subcorpus (e.g. any of the other previously created subcorpora listed in the above diagram). The screenshot below shows the resulting page when the user chooses to add the above three files to a pre-existing subcorpus called "test2":



You are now ready to perform a Subcorpus query.

 

 

Notes

  1. COPAC is the joint academic libraries catalogue system for the UK, and contains library entries for published books/journals, including information about authors and text keywords entered by professional library cataloguers. More information about how this was used to correct errors and add information to the BNC World Edition is available here.
  2. There are also 10 spoken texts which were erroneously given COPAC keywords in the text headers: F7E, F7F, F7G, HYC, HYD, JNR, JNS, KRT, KS3, KS6.

 

[Standard query | Lemma query | Browse a file | Word lookup | Scan keywords/titles | Explore genre labels | Frequency lists | User settings | Query history | Saved queries | Create/Edit subcorpora | Post-query options ]