Inherent limitations
BNCweb inherits many of its strengths from the SARA server software. However, it also derives from SARA
some of its limitations.
The possibly most noticeable limitation is the fact that a BNC query always has to start off as lexical in nature.
In other words, it is not possible to search for grammatical patterns (using POS-tags) from the outset. The
following searches therefore cannot be carried out:
-
Find all nouns in the BNC
(But you can compile a Frequency list of all nouns and then perform queries individually.)
-
Find any instance where pretty is followed by an adjective
(But you can first search for pretty and then use the Sort
feature to retrieve this data).
-
Find any instance where pretty is followed within five words by a noun
(But you can first search for pretty and then use the Tag sequence search
feature to retrieve this data).
In its current version (2.0), BNCweb cannot from the outset restrict searches to user-defined subcorpora.
Rather, queries need to be performed over the whole BNC first, to be followed by a restriction to the subcorpus
of your choice. It is, however, possible to restrict queries by a whole range of metatextual categories from the outset. Please consult
the Standard query and Create/edit subcorpora manual
pages for clarification.
BNCweb does not offer a user-friendly interface to all functions supported by the SARA server software. Some
minor features may be added in the future. It is, however, possible to enter any query conforming to CQL-query syntax
into the search box of the main page in BNCweb.
System-dependent limitations
BNCweb offers some features which are highly CPU-intensive and require a lot of disk-space. Default limits
are therefore imposed on the following four features in terms of the number of hits to which they can be applied:
- Collocations
- Distribution analysis
- Sort
- Tag sequence search
A warning message will be displayed when your query result has more hits than allowed. If you need
to use any of these features with a larger number of hits, contact your system administrator
who can increase the limit globally or for individual users.
Methodological limitations
BNCweb has been designed to offer user-friendly access to a whole wealth of data in the BNC. It produces
descriptive statistics on the fly that would require considerable "manual" work on the part of the researcher
with other corpus linguistics tools. While this is certainly one of its advantages, it also has drawbacks: In our
experience, users have sometimes exhibited too much enthusiasm at being able to compile endless lists and tables.
It is important to stress that BNCweb produces only raw data - a meaningful interpretation of this data remains the task
of the researcher. BNCweb can't replace human intuition - but it can relieve the careful scholar of a lot of tedious work.
(See also the note on the Distribution feature.)
|