International audienceWord alignments identify translational correspondences between words in a parallel sentence pair and are used, for example, to train statistical machine translation, learn bilingual dictionaries or to perform quality estimation. Subword tokenization has become a standard preprocessing step for a large number of applications, notably for state-of-the-art open vocabulary machine translation systems. In this paper, we thoroughly study how this preprocessing step interacts with the word alignment task and propose several tokenization strategies to obtain well-segmented parallel corpora. Using these new techniques, we were able to improve baseline word-based alignment models for six language pairs
In machine translation, the alignment of corpora has evolved into a mature research area, aimed at p...
We present an algorithm for bilingual word alignment that extends previous work by treating multi-wo...
In this paper we describe a statistical tech-nique for aligning sentences with their translations in...
International audienceWord alignments identify translational correspondences between words in a para...
International audienceWord alignments identify translational correspondences between words in a para...
International audienceWord alignments identify translational correspondences between words in a para...
International audienceWord alignments identify translational correspondences between words in a para...
International audienceWord alignments identify translational correspondences between words in a para...
UnrestrictedAll state of the art statistical machine translation systems and many example-based mach...
Learning word alignments between parallel sentence pairs is an important task in Statistical Machine...
Training a state-of-the-art syntax-based statistical machine translation (MT) system to translate fr...
Training a state-of-the-art syntax-based statistical machine translation (MT) system to translate fr...
Automatic word alignment is a key step in training statistical machine translation systems. Despite ...
In this paper, we describe the architecture of a sub-sentential alignment system that links linguist...
Bilingual lexicons of multiword expressions play a vital role in several natural language processing...
In machine translation, the alignment of corpora has evolved into a mature research area, aimed at p...
We present an algorithm for bilingual word alignment that extends previous work by treating multi-wo...
In this paper we describe a statistical tech-nique for aligning sentences with their translations in...
International audienceWord alignments identify translational correspondences between words in a para...
International audienceWord alignments identify translational correspondences between words in a para...
International audienceWord alignments identify translational correspondences between words in a para...
International audienceWord alignments identify translational correspondences between words in a para...
International audienceWord alignments identify translational correspondences between words in a para...
UnrestrictedAll state of the art statistical machine translation systems and many example-based mach...
Learning word alignments between parallel sentence pairs is an important task in Statistical Machine...
Training a state-of-the-art syntax-based statistical machine translation (MT) system to translate fr...
Training a state-of-the-art syntax-based statistical machine translation (MT) system to translate fr...
Automatic word alignment is a key step in training statistical machine translation systems. Despite ...
In this paper, we describe the architecture of a sub-sentential alignment system that links linguist...
Bilingual lexicons of multiword expressions play a vital role in several natural language processing...
In machine translation, the alignment of corpora has evolved into a mature research area, aimed at p...
We present an algorithm for bilingual word alignment that extends previous work by treating multi-wo...
In this paper we describe a statistical tech-nique for aligning sentences with their translations in...