Statistical machine translation relies heavily on parallel corpora to train its models for translation tasks. While more and more bilingual corpora are readily available, the quality of the sentence pairs should be taken into consideration. This paper presents a novel lattice score-based data cleaning method to select proper sentence pairs from the ones extracted from a bilingual corpus by the sentence alignment methods. The proposed method is carried out as follows: firstly, an initial phrasebased model is trained on the full sentencealigned corpus; then for each of the sentence pairs in the corpus, word alignments are used to create anchor pairs and sourceside lattices; thirdly, based on the translation model, target-side...
The quality of bilingual data is a key factor in Statistical Machine Translation (SMT). Low-quality ...
In the last decade, while statistical machine translation has advanced significantly, there is still...
We use existing tools to automatically build two parallel treebanks from existing parallel corpora. ...
Statistical machine translation relies heavily on parallel corpora to train its models for transla...
Statistical machine translation relies heav-ily on parallel corpora to train its mod-els for transla...
State-of-the-art statistical machine translation systems make use of a large translation table obta...
In most statistical machine translation (SMT) systems, bilingual segments are extracted via word a...
We present a method for improving statistical machine translation performance by using linguisticall...
UnrestrictedAll state of the art statistical machine translation systems and many example-based mach...
State-of-the-art statistical machine transla-tion systems make use of a large trans-lation table obt...
This thesis develops a robust inventory of large-scale lattice rescoring methods that improve the qu...
Machine translation is the task of automatically translating a text from one natural language into a...
The parameters of statistical translation models are typically estimated from sentence-aligned paral...
The goal of a machine translation (MT) system is to automatically translate a document written in so...
In most statistical machine translation (SMT) systems, bilingual segments are extracted via word ali...
The quality of bilingual data is a key factor in Statistical Machine Translation (SMT). Low-quality ...
In the last decade, while statistical machine translation has advanced significantly, there is still...
We use existing tools to automatically build two parallel treebanks from existing parallel corpora. ...
Statistical machine translation relies heavily on parallel corpora to train its models for transla...
Statistical machine translation relies heav-ily on parallel corpora to train its mod-els for transla...
State-of-the-art statistical machine translation systems make use of a large translation table obta...
In most statistical machine translation (SMT) systems, bilingual segments are extracted via word a...
We present a method for improving statistical machine translation performance by using linguisticall...
UnrestrictedAll state of the art statistical machine translation systems and many example-based mach...
State-of-the-art statistical machine transla-tion systems make use of a large trans-lation table obt...
This thesis develops a robust inventory of large-scale lattice rescoring methods that improve the qu...
Machine translation is the task of automatically translating a text from one natural language into a...
The parameters of statistical translation models are typically estimated from sentence-aligned paral...
The goal of a machine translation (MT) system is to automatically translate a document written in so...
In most statistical machine translation (SMT) systems, bilingual segments are extracted via word ali...
The quality of bilingual data is a key factor in Statistical Machine Translation (SMT). Low-quality ...
In the last decade, while statistical machine translation has advanced significantly, there is still...
We use existing tools to automatically build two parallel treebanks from existing parallel corpora. ...