[Standard query | Lemma query | Browse a file | Word lookup | Scan keywords/titles | Explore genre labels | Frequency lists | User settings | Query history | Saved queries | Create/Edit subcorpora | Post-query options ]

Frequency lists

The Frequency list feature makes it possible to compile ordered lists of lexical items and lemma forms based on a number of user-definable options. The following table lists the options available for retrieving frequency lists of lexical items (<w>-units). With the exception of the first row, all options also apply to lemma-based frequency list compilation. (This is not a real screenshot, but a slightly altered one, with an additional column of explanatory notes)

Text

Option/Button

Explanation/Hint

POS-tag frequencies

Choose one or several POS-tags:

To select multiple items within one slot/position (e.g. all tags beginning VB-):
  • Continuous selection: click on the first item in the list (e.g. VBB), then hold the "Shift" key and click on the last item (e.g. VBZ)
  • Discontinuous selection:
    • Windows keyboard: hold the "Control" key and left-click on each individual item
    • Macintosh keyboard: hold the "Command" key and click on each individual item

Show words Applies a regular expression pattern as a filter. You can enter a string of alphanumeric characters (e.g. ing) in order to retrieve words that start with/end in/contain correspond to your input. You may also enter a regular expression to apply a more flexible filter (e.g. un.{7,9}ness, which retrieves instances where un is followed by between 7 and 9 characters, followed by ness - see screenshot below). The supported regular expression syntax is described in the MySQL user manual.

Range of texts:

Selects whether the frequency list will be based on the whole corpus or only on the spoken or the written component. Other types of restrictions (e.g. text domain, age of author, etc.) are not available because the necessary databases would require vast amounts of disk-space.

Range of frequency (optional):

from to Reduces the frequency list to items which occur within a certain frequency range.

Type of ordering:

Influences whether the most frequent item will be displayed first or last in the list.

Number of items shown per page:  

Setting this option to a higher value can save you time if you are not only interested in the topmost few items of the resulting list. Reason: It is possible to navigate within the frequency list (see below) but changing the page may take a long time because the query has to be re-performed for each individual page.

Show individual tag frequencies:

If you selected several tags (or a range of tags such as any noun), you can choose whether the items in the frequency list will be grouped together by lexical/lemma form or whether the frequency for each word - tag pair will be displayed separately.

The following screenshot shows the first 15 entries in a frequency list for nouns (with POS-tag NN1), filtered with the regular expression un.{7,9}ness:

You may navigate through the frequency list with the help of these three elements:

Since the query for the compilation of the frequency list has to be re-performed, navigation between different pages may take some time. Do not click on the link again if you don't get an immediate response as this would only slow down the server for yourself and other users.

Clicking on a word performs a search for the word-tag combination (or lemma-lemmatype combination in the case of a lemma frequency list) and displays the solutions in a BNC query result. This option is only available if the items displayed are based on a single POS-tag. If several POS-tags are grouped together (see explanation above), no link will be available.

You can also choose to save the whole frequency list to your hard-disk by selecting Download whole Frequency List in the pop-down menu. Please note that while BNCweb does not stop you from downloading a frequency list of all lexical items in the BNC, this will take a long time to complete over a slow connection! Rather, you may want to set a lower limit for the number of occurrences needed for inclusion into the list.

HINT: With the help of the frequency list feature, it is (to some extent) possible to get around the limitation that SARA cannot perform queries for POS-tags from the outset. For example, if you are interested in intensification, you would ideally want to look for all instances where an adverb precedes an adjective. This is, however, impossible with BNCweb - you will have to work with a list of lexical items instead. But which lexical items? Using the frequency list feature, you can determine which are the most frequent adverbs ending in -ly. The 25 top entries of this list cover a sizeable number of all adverbs in the BNC. Thus, if you can say that you have checked intensification with these 25 most frequent adverbs ending in -ly, your methodological basis is more sound that when the same study is done on the basis of a list compiled by intuition.

 

Notes

  1. The word-frequencies returned by this feature may occasionally differ slightly from the number of hits retrieved by a standard query. This is mostly due to differences in how multiword units are treated. We have tried to minimise the discrepancies but may not always have been successful.
  2. You might also find it useful to consult the web site which accompanies the following book:

    Leech, G., Rayson, P., and Wilson, A. (2001). Word Frequencies in Written and Spoken English: based on the British National Corpus. Longman, London.

    It contains frequency lists for the whole BNC (version 1), for the spoken versus written components, for the conversational (i.e. demographic) versus task-oriented (i.e. context-governed) parts of the spoken component, and for the imaginative versus informative parts of the written component. It also has ranked frequency word lists according to parts of speech (e.g. all nouns, all conjunctions) based on the whole BNC corpus (version 1), as well as frequencies for individual part-of-speech tags (e.g. NN1, VDG) based on the BNC Sampler.
    Although the frequency lists for this book were based on all 4124 files of the original BNC version 1 corpus, the text classifications and POS tags used were the updated and more accurate ones implemented in the BNC World Edition.
[Standard query | Lemma query | Browse a file | Word lookup | Scan keywords/titles | Explore genre labels | Frequency lists | User settings | Query history | Saved queries | Create/Edit subcorpora | Post-query options ]