This paper describes a methodology for build-ing aligned multilingual corpora form movie subtitles found on the Web. The subtitles have specific formats and encodings. In a first step, we convert them to our multilingual sub-title format based on XML. In a second step, we align the subtitle sentences with the time used to display them on the screen. We im-plemented the tool Jimaku in order to semi-automatically perform these steps. The last step consists in aligning the sentences at the sub-sentence level and to index the corpus for contextual lookup. For this step, we use the WIMS platform, result of previous research on text collections management.
<p>In this paper, we leverage the existence of dual subtitles as a source of parallel data. Dual sub...
In this paper, we leverage the existence of dual subtitles as a source of parallel data. Dual subtit...
Abstract. This paper proposes to use DTW to construct parallel corpora from difficult data. Parallel...
This paper describes a methodology for building aligned multilingual corpora form movie subtitles fo...
This paper describes a methodology for building aligned multilingual corpora form movie subtitles fo...
This paper presents a method for compiling a large-scale bilingual corpus from a database of movie s...
Abstract. This paper describes a methodology for constructing aligned Ger-man-Chinese corpora from m...
This work is about the creation of parallel corpus, where movie subtitles is main source. In particu...
SUMAT is a project funded through the EU ICT Policy Support Programme (2011–2014). It involves four ...
After learning the basic principles of building parallel corpora, the student will focus on the Czec...
After learning the basic principles of building parallel corpora, the student will focus on the Czec...
International audienceThis paper focuses on two aspects of Machine Translation: parallel corpora and...
In this paper, we leverage the existence of dual subtitles as a source of parallel data. Dual subtit...
A multilingual corpus of movie subtitles aligned on the sentence-level. Contains data on more than 5...
A multilingual corpus of movie subtitles aligned on the sentence-level. Contains data on more than 5...
<p>In this paper, we leverage the existence of dual subtitles as a source of parallel data. Dual sub...
In this paper, we leverage the existence of dual subtitles as a source of parallel data. Dual subtit...
Abstract. This paper proposes to use DTW to construct parallel corpora from difficult data. Parallel...
This paper describes a methodology for building aligned multilingual corpora form movie subtitles fo...
This paper describes a methodology for building aligned multilingual corpora form movie subtitles fo...
This paper presents a method for compiling a large-scale bilingual corpus from a database of movie s...
Abstract. This paper describes a methodology for constructing aligned Ger-man-Chinese corpora from m...
This work is about the creation of parallel corpus, where movie subtitles is main source. In particu...
SUMAT is a project funded through the EU ICT Policy Support Programme (2011–2014). It involves four ...
After learning the basic principles of building parallel corpora, the student will focus on the Czec...
After learning the basic principles of building parallel corpora, the student will focus on the Czec...
International audienceThis paper focuses on two aspects of Machine Translation: parallel corpora and...
In this paper, we leverage the existence of dual subtitles as a source of parallel data. Dual subtit...
A multilingual corpus of movie subtitles aligned on the sentence-level. Contains data on more than 5...
A multilingual corpus of movie subtitles aligned on the sentence-level. Contains data on more than 5...
<p>In this paper, we leverage the existence of dual subtitles as a source of parallel data. Dual sub...
In this paper, we leverage the existence of dual subtitles as a source of parallel data. Dual subtit...
Abstract. This paper proposes to use DTW to construct parallel corpora from difficult data. Parallel...