[Standard query | Lemma query | Browse a file | Word lookup | Scan keywords/titles | Explore genre labels | Frequency lists | User settings | Query history | Saved queries | Create/Edit subcorpora | Post-query options ] |
Frequency lists |
The Frequency list feature makes it possible to compile ordered lists of lexical items and lemma forms based on a number of user-definable options. The following table lists the options available for retrieving frequency lists of lexical items (<w>-units). With the exception of the first row, all options also apply to lemma-based frequency list compilation. (This is not a real screenshot, but a slightly altered one, with an additional column of explanatory notes)
Text |
Option/Button |
Explanation/Hint |
POS-tag frequencies |
||
Choose one or several POS-tags: |
To select multiple items within one slot/position (e.g. all tags
beginning VB-):
|
|
Show words | Applies a regular expression pattern as a filter. You can enter a string of alphanumeric characters (e.g. ing) in order to retrieve words that start with/end in/contain correspond to your input. You may also enter a regular expression to apply a more flexible filter (e.g. un.{7,9}ness, which retrieves instances where un is followed by between 7 and 9 characters, followed by ness - see screenshot below). The supported regular expression syntax is described in the MySQL user manual. | |
|
Selects whether the frequency list will be based on the whole corpus or only on the spoken or the written component. Other types of restrictions (e.g. text domain, age of author, etc.) are not available because the necessary databases would require vast amounts of disk-space. | |
|
from to | Reduces the frequency list to items which occur within a certain frequency range. |
|
Influences whether the most frequent item will be displayed first or last in the list. | |
|
Setting this option to a higher value can save you time if you are not only interested in the topmost few items of the resulting list. Reason: It is possible to navigate within the frequency list (see below) but changing the page may take a long time because the query has to be re-performed for each individual page. | |
|
If you selected several tags (or a range of tags such as any noun), you can choose whether the items in the frequency list will be grouped together by lexical/lemma form or whether the frequency for each word - tag pair will be displayed separately. | |
The following screenshot shows the first 15 entries in a frequency list for nouns (with POS-tag NN1), filtered with the regular expression un.{7,9}ness:
You may navigate through the frequency list with the help of these three elements:
Since the query for the compilation of the frequency list has to be re-performed, navigation between different pages may take some time. Do not click on the link again if you don't get an immediate response as this would only slow down the server for yourself and other users.
Clicking on a word performs a search for the word-tag combination (or lemma-lemmatype combination in the case of a lemma frequency list) and displays the solutions in a BNC query result. This option is only available if the items displayed are based on a single POS-tag. If several POS-tags are grouped together (see explanation above), no link will be available.
You can also choose to save the whole frequency list to your hard-disk by selecting Download whole Frequency List in the pop-down menu. Please note that while BNCweb does not stop you from downloading a frequency list of all lexical items in the BNC, this will take a long time to complete over a slow connection! Rather, you may want to set a lower limit for the number of occurrences needed for inclusion into the list.
HINT: With the help of the frequency list feature, it is (to some extent) possible to get around the limitation that SARA cannot perform queries for POS-tags from the outset. For example, if you are interested in intensification, you would ideally want to look for all instances where an adverb precedes an adjective. This is, however, impossible with BNCweb - you will have to work with a list of lexical items instead. But which lexical items? Using the frequency list feature, you can determine which are the most frequent adverbs ending in -ly. The 25 top entries of this list cover a sizeable number of all adverbs in the BNC. Thus, if you can say that you have checked intensification with these 25 most frequent adverbs ending in -ly, your methodological basis is more sound that when the same study is done on the basis of a list compiled by intuition.
Notes |
[Standard query | Lemma query | Browse a file | Word lookup | Scan keywords/titles | Explore genre labels | Frequency lists | User settings | Query history | Saved queries | Create/Edit subcorpora | Post-query options ] |