We examine methods for collecting parallel Czech-English corpora from the web. We propose and evaluate automatic methods for finding source web sites, language identification and most importantly the document alignment of obtained pages
CzEng 1.0 is the fourth release of a sentence-parallel Czech-English corpus compiled at the Institut...
Parallel corpora are indispensable resources for a variety of multilingual natural language processi...
Sentence-parallel corpus made from English and Czech Wikipedias based on translated articles from En...
Corpus of manually aligned Czech-English parallel sentences. It comprises 2500 parallel sentences fr...
The paper describes suitable sources for creating Czech-Slovak parallel corpora, including our proce...
We describe our ongoing efforts in collecting a Czech-English parallel corpus CzEng. The paper provi...
CzEng 0.9 is the third release of a large parallel corpus of Czech and English. For the current rele...
The algorithm of the creation texts parallel corpora was presented. The algorithm is based on the us...
STRAND Resnik is a language independent system for automatic discovery of text in parallel transl...
In our paper, we present main results of the Czech grant project Internet as a Language Corpus, whos...
Parallel corpora are a crucial resource in research fields such as cross-lingual infor-mation retrie...
CzEng 1.0 is an updated release of our Czech-English parallel corpus, freely available for non-comme...
In this work, an extensible word-alignment framework is implemented from scratch. It is based on a d...
Multilingual resources are useful for linguistic studies, translation, and many other tasks. Unfortu...
CzEng 0.7 is a Czech-English parallel corpus compiled at the Institute of Formal and Applied Linguis...
CzEng 1.0 is the fourth release of a sentence-parallel Czech-English corpus compiled at the Institut...
Parallel corpora are indispensable resources for a variety of multilingual natural language processi...
Sentence-parallel corpus made from English and Czech Wikipedias based on translated articles from En...
Corpus of manually aligned Czech-English parallel sentences. It comprises 2500 parallel sentences fr...
The paper describes suitable sources for creating Czech-Slovak parallel corpora, including our proce...
We describe our ongoing efforts in collecting a Czech-English parallel corpus CzEng. The paper provi...
CzEng 0.9 is the third release of a large parallel corpus of Czech and English. For the current rele...
The algorithm of the creation texts parallel corpora was presented. The algorithm is based on the us...
STRAND Resnik is a language independent system for automatic discovery of text in parallel transl...
In our paper, we present main results of the Czech grant project Internet as a Language Corpus, whos...
Parallel corpora are a crucial resource in research fields such as cross-lingual infor-mation retrie...
CzEng 1.0 is an updated release of our Czech-English parallel corpus, freely available for non-comme...
In this work, an extensible word-alignment framework is implemented from scratch. It is based on a d...
Multilingual resources are useful for linguistic studies, translation, and many other tasks. Unfortu...
CzEng 0.7 is a Czech-English parallel corpus compiled at the Institute of Formal and Applied Linguis...
CzEng 1.0 is the fourth release of a sentence-parallel Czech-English corpus compiled at the Institut...
Parallel corpora are indispensable resources for a variety of multilingual natural language processi...
Sentence-parallel corpus made from English and Czech Wikipedias based on translated articles from En...