<p>In this paper, we leverage the existence of dual subtitles as a source of parallel data. Dual subtitles present viewers with two languages simultaneously, and are generally aligned in the segment level, which removes the need to automatically perform this alignment. This is desirable as extracted parallel data does not contain alignment errors present in previous work that aligns different subtitle files for the same movie. We present a simple heuristic to detect and extract dual subtitles and show that more than 20 million sentence pairs can be extracted for the Mandarin-English language pair. We also show that extracting data from this source can be a viable solution for improving Machine Translation systems in the domain of subtitles....
This work is about the creation of parallel corpus, where movie subtitles is main source. In particu...
We explore the usability of different bilingual corpora for the purpose of multilingual and cross-li...
Parallel corpora extracted from online repositories of movie and TV subtitles are employed in a wide...
In this paper, we leverage the existence of dual subtitles as a source of parallel data. Dual subtit...
In this paper, we leverage the existence of dual subtitles as a source of parallel data. Dual subtit...
Abstract. This paper describes a methodology for constructing aligned Ger-man-Chinese corpora from m...
International audienceThis paper proposes to use DTW to construct parallel corpora from difficult da...
International audienceThis paper focuses on two aspects of Machine Translation: parallel corpora and...
Abstract. This paper proposes to use DTW to construct parallel corpora from difficult data. Parallel...
SUMAT is a project funded through the EU ICT Policy Support Programme (2011–2014). It involves four ...
We present a new major release of the OpenSubtitles collection of parallel corpora. The release is c...
Due to the lack of ideal resources, few researchers have investigated how to improve the machine tra...
Due to the lack of ideal resources, few researchers have investigated how to improve the machine tra...
In statistical machine translation, large numbers of parallel sentences are required to train the mo...
This paper presents a method for compiling a large-scale bilingual corpus from a database of movie s...
This work is about the creation of parallel corpus, where movie subtitles is main source. In particu...
We explore the usability of different bilingual corpora for the purpose of multilingual and cross-li...
Parallel corpora extracted from online repositories of movie and TV subtitles are employed in a wide...
In this paper, we leverage the existence of dual subtitles as a source of parallel data. Dual subtit...
In this paper, we leverage the existence of dual subtitles as a source of parallel data. Dual subtit...
Abstract. This paper describes a methodology for constructing aligned Ger-man-Chinese corpora from m...
International audienceThis paper proposes to use DTW to construct parallel corpora from difficult da...
International audienceThis paper focuses on two aspects of Machine Translation: parallel corpora and...
Abstract. This paper proposes to use DTW to construct parallel corpora from difficult data. Parallel...
SUMAT is a project funded through the EU ICT Policy Support Programme (2011–2014). It involves four ...
We present a new major release of the OpenSubtitles collection of parallel corpora. The release is c...
Due to the lack of ideal resources, few researchers have investigated how to improve the machine tra...
Due to the lack of ideal resources, few researchers have investigated how to improve the machine tra...
In statistical machine translation, large numbers of parallel sentences are required to train the mo...
This paper presents a method for compiling a large-scale bilingual corpus from a database of movie s...
This work is about the creation of parallel corpus, where movie subtitles is main source. In particu...
We explore the usability of different bilingual corpora for the purpose of multilingual and cross-li...
Parallel corpora extracted from online repositories of movie and TV subtitles are employed in a wide...