Large, uniformly encoded collections of texts, corpora, are an invaluable source of data, not only for linguists, but also for Language Technology tools. Especially useful are multilingual parallel corpora, as they enable e.g. the induction of translation knowledge in the shape of multilingual lexica or full-fledged machine translation models. The utility of such corpora is even greater if they are sentence aligned between the languages, and are linguistically annotated. But parallel corpora, esp. large ones, are still scarce, and have been, so far, difficult to acquire. Recently, however, a large new source of parallel texts has become available on the Web, which contains EU law texts (the Acquis Communautaire) in all the languages of the ...
The Dutch Parallel Corpus (DPC) is a translation corpus containing Dutch, English and French text sa...
Proceedings of the Workshop on Annotation and Exploitation of Parallel Corpora AEPC 2010. Editors:...
Multilingual textual basisParCoLab is a 12-million-word parallel corpus containing original and tran...
The paper discusses the compilation of massively multilingual corpora, the EU ACQUIS corpus, and the...
We present a new, unique and freely available parallel corpus containing European Union (EU) documen...
The paper presents the SVEZ-IJS corpus, a large parallel annotated English-Slovene corpus containing...
We are presenting a new, unique and freely available parallel corpus available in all 20 official Eu...
This paper focuses on the description of the corpus «PEST-INTER» in five languages and the process o...
With more and more text being available in electronic form, it is becoming relatively easy to obtain...
In this article we illustrate and evaluate an approach to the creation of high quality linguisticall...
The paper presents the methodology and the outcome of the compilation and the processing of the Bulg...
The EU Copernicus project Multext-East has created a multi-lingual corpus of text and speech data,...
Starting in 2006, the European Commission’s Joint Research Centre (JRC) and other European Union org...
Czech-Slovak parallel corpus consisting of several freely available corpora (Acquis [1], Europarl [2...
English-Slovak parallel corpus consisting of several freely available corpora (Acquis [1], Europarl ...
The Dutch Parallel Corpus (DPC) is a translation corpus containing Dutch, English and French text sa...
Proceedings of the Workshop on Annotation and Exploitation of Parallel Corpora AEPC 2010. Editors:...
Multilingual textual basisParCoLab is a 12-million-word parallel corpus containing original and tran...
The paper discusses the compilation of massively multilingual corpora, the EU ACQUIS corpus, and the...
We present a new, unique and freely available parallel corpus containing European Union (EU) documen...
The paper presents the SVEZ-IJS corpus, a large parallel annotated English-Slovene corpus containing...
We are presenting a new, unique and freely available parallel corpus available in all 20 official Eu...
This paper focuses on the description of the corpus «PEST-INTER» in five languages and the process o...
With more and more text being available in electronic form, it is becoming relatively easy to obtain...
In this article we illustrate and evaluate an approach to the creation of high quality linguisticall...
The paper presents the methodology and the outcome of the compilation and the processing of the Bulg...
The EU Copernicus project Multext-East has created a multi-lingual corpus of text and speech data,...
Starting in 2006, the European Commission’s Joint Research Centre (JRC) and other European Union org...
Czech-Slovak parallel corpus consisting of several freely available corpora (Acquis [1], Europarl [2...
English-Slovak parallel corpus consisting of several freely available corpora (Acquis [1], Europarl ...
The Dutch Parallel Corpus (DPC) is a translation corpus containing Dutch, English and French text sa...
Proceedings of the Workshop on Annotation and Exploitation of Parallel Corpora AEPC 2010. Editors:...
Multilingual textual basisParCoLab is a 12-million-word parallel corpus containing original and tran...