When parallel or comparable corpora are harvested from the web, there is typically a tradeoff between the size and quality of the data. In order to improve quality, corpus collection efforts often attempt to fix or remove misaligned sentence pairs. But, at the same time, Statistical Machine Translation (SMT) systems are widely assumed to be relatively robust to sentence alignment errors. However, there is little empirical evidence to support and characterize this robustness. This contribution investigates the impact of sentence alignment errors on a typical phrase-based SMT system. We confirm that SMT systems are highly tolerant to noise, and that performance only degrades seriously at very high noise levels. Our findings suggest that when ...
In most statistical machine translation (SMT) systems, bilingual segments are extracted via word ali...
which permits unrestricted use, distribution, and reproduction in any medium, provided the original ...
Automatic word alignment is a key step in training statistical machine translation systems. Despite ...
Statistical Word Alignments represent lexical word-to-word translations between source and target la...
Most statistical machine translation systems employ a word-based alignment model. In this paper we d...
The training process of the translation model in statistical machine translation requires a sentence...
The parameters of statistical translation models are typically estimated from sentence-aligned paral...
UnrestrictedAll state of the art statistical machine translation systems and many example-based mach...
In most statistical machine translation (SMT) systems, bilingual segments are extracted via word ali...
In the last years, researchers conducted several studies to evaluate the machine translation quality...
In most statistical machine translation (SMT) systems, bilingual segments are ex-tracted via word al...
The goal of a machine translation (MT) system is to automatically translate a document written in so...
In most statistical machine translation (SMT) systems, bilingual segments are extracted via word a...
In machine translation, the alignment of corpora has evolved into a mature research area, aimed at p...
The empirical adequacy of synchronous context-free grammars of rank two (2-SCFGs) (Satta and Peseric...
In most statistical machine translation (SMT) systems, bilingual segments are extracted via word ali...
which permits unrestricted use, distribution, and reproduction in any medium, provided the original ...
Automatic word alignment is a key step in training statistical machine translation systems. Despite ...
Statistical Word Alignments represent lexical word-to-word translations between source and target la...
Most statistical machine translation systems employ a word-based alignment model. In this paper we d...
The training process of the translation model in statistical machine translation requires a sentence...
The parameters of statistical translation models are typically estimated from sentence-aligned paral...
UnrestrictedAll state of the art statistical machine translation systems and many example-based mach...
In most statistical machine translation (SMT) systems, bilingual segments are extracted via word ali...
In the last years, researchers conducted several studies to evaluate the machine translation quality...
In most statistical machine translation (SMT) systems, bilingual segments are ex-tracted via word al...
The goal of a machine translation (MT) system is to automatically translate a document written in so...
In most statistical machine translation (SMT) systems, bilingual segments are extracted via word a...
In machine translation, the alignment of corpora has evolved into a mature research area, aimed at p...
The empirical adequacy of synchronous context-free grammars of rank two (2-SCFGs) (Satta and Peseric...
In most statistical machine translation (SMT) systems, bilingual segments are extracted via word ali...
which permits unrestricted use, distribution, and reproduction in any medium, provided the original ...
Automatic word alignment is a key step in training statistical machine translation systems. Despite ...