The Opus CVS, Wiki and search interface are temporarily down due to hardware failiure.




[Home] [Query] [EUconst] [OO] [KDE] [KDEdoc] [PHP] [EUROPARL] [CVS] [Tools]

OPUS - an open source parallel corpus

OPUS is an attempt to collect translated texts from the web, to convert and align the entire collection, to add linguistic data, and to provide the community with a publicly available parallel corpus. OPUS is based on open source products and is also delivered as an open source package. We used several tools to compile the current corpus. (Manual corrections have not been made.)

The OPUS collection is extensive. New data will be available from this page and the OPUS cvs repository
Contributions are welcome! Please contact lars.nygaard@ilf.uio.no or joerg@stp.ling.uu.se!

Search & Browse Tools

Download:


Documentation & FAQ's: Project members:

Publications

Jörg Tiedemann, Lars Nygaard, 2004
The OPUS corpus - parallel & free. [pdf]
In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC'04). Lisbon, Portugal, May 26-28.
Jörg Tiedemann, to appear,
OPUS - an open source parallel corpus. [gzipped ps]
In Proceedings of the 13th Nordic Conference on Computational Linguistics, University of Iceland, Reykjavik, 2003.