After learning the basic principles of building parallel corpora, the student will focus on the Czech-English parallel corpus Czeng. The main goal of the work is to improve quality of the Czeng part created from Czech/English movie and series subtitles. Above all, it is necessary to design and implement methods for detecting wrongly aligned (or otherwise problematic) subtitle files or their parts. Impact of the cleaning methods on the corpus quality will be evaluated quantitatively
CzEng 1.0 is the fourth release of a sentence-parallel Czech-English corpus compiled at the Institut...
In this paper, we leverage the existence of dual subtitles as a source of parallel data. Dual subtit...
CsEnVi Pairwise Parallel Corpora consist of Vietnamese-Czech parallel corpus and Vietnamese-English ...
After learning the basic principles of building parallel corpora, the student will focus on the Czec...
This work is about the creation of parallel corpus, where movie subtitles is main source. In particu...
Abstract. This paper describes a methodology for constructing aligned Ger-man-Chinese corpora from m...
SUMAT is a project funded through the EU ICT Policy Support Programme (2011–2014). It involves four ...
CzEng 0.9 is the third release of a large parallel corpus of Czech and English. For the current rele...
Abstract. This paper proposes to use DTW to construct parallel corpora from difficult data. Parallel...
CzEng 0.9 is the third release of a large parallel corpus of Czech and English. For the current rele...
This paper describes a methodology for build-ing aligned multilingual corpora form movie subtitles f...
This paper describes a methodology for building aligned multilingual corpora form movie subtitles fo...
This paper describes a methodology for building aligned multilingual corpora form movie subtitles fo...
CzEng 1.0 is the fourth release of a sentence-parallel Czech-English corpus compiled at the Institut...
<p>In this paper, we leverage the existence of dual subtitles as a source of parallel data. Dual sub...
CzEng 1.0 is the fourth release of a sentence-parallel Czech-English corpus compiled at the Institut...
In this paper, we leverage the existence of dual subtitles as a source of parallel data. Dual subtit...
CsEnVi Pairwise Parallel Corpora consist of Vietnamese-Czech parallel corpus and Vietnamese-English ...
After learning the basic principles of building parallel corpora, the student will focus on the Czec...
This work is about the creation of parallel corpus, where movie subtitles is main source. In particu...
Abstract. This paper describes a methodology for constructing aligned Ger-man-Chinese corpora from m...
SUMAT is a project funded through the EU ICT Policy Support Programme (2011–2014). It involves four ...
CzEng 0.9 is the third release of a large parallel corpus of Czech and English. For the current rele...
Abstract. This paper proposes to use DTW to construct parallel corpora from difficult data. Parallel...
CzEng 0.9 is the third release of a large parallel corpus of Czech and English. For the current rele...
This paper describes a methodology for build-ing aligned multilingual corpora form movie subtitles f...
This paper describes a methodology for building aligned multilingual corpora form movie subtitles fo...
This paper describes a methodology for building aligned multilingual corpora form movie subtitles fo...
CzEng 1.0 is the fourth release of a sentence-parallel Czech-English corpus compiled at the Institut...
<p>In this paper, we leverage the existence of dual subtitles as a source of parallel data. Dual sub...
CzEng 1.0 is the fourth release of a sentence-parallel Czech-English corpus compiled at the Institut...
In this paper, we leverage the existence of dual subtitles as a source of parallel data. Dual subtit...
CsEnVi Pairwise Parallel Corpora consist of Vietnamese-Czech parallel corpus and Vietnamese-English ...