Sentence-aligned bilingual texts are a crucial resource to build statistical machine translation (SMT) systems. In this paper we propose to apply lightly-supervised training to pro-duce additional parallel data. The idea is to translate large amounts of monolingual data (up to 275M words) with an SMT system, and to use those as additional training data. Results are reported for the translation from French into En-glish. We consider two setups: first the intial SMT system is only trained with a very limited amount of human-produced translations, and then the case where we have more than 100 million words. In both conditions, lightly-supervised train-ing achieves significant improvements of the BLEU score. 1
In the past few decades machine translation research has made major progress. A researcher now has a...
This paper investigates optimal ways to get maximal coverage from minimal input training corpus. In ...
We use bilingual lexicon induction techniques, which learn translations from monolin-gual texts in t...
Sentence-aligned bilingual texts are a crucial resource to build statistical machine translation (SM...
We report on findings of exploiting large data sets for translation modeling, language mod-eling and...
Statistical machine translation relies heavily on available parallel corpora, but SMT may not have t...
The amount of training data in statistical machine translation is critical for translation quality. ...
Statistical Machine Translation (SMT) models learn how to translate by examining a bilingual paralle...
Since the eighties, new approaches in machine translation have been explored. These new approaches a...
Abstract. Statistical Machine Translation (SMT) systems are usually trained on large amounts of bili...
We collected a corpus of parallel text in 11 lan-guages from the proceedings of the European Par-lia...
Statistical machine translation systems are usually trained on large amounts of bilingual text and o...
Statistical machine translation systems are usually trained on large amounts of bilingual text and m...
Parallel corpus is an indispensable resource for translation model training in statistical machine t...
Automatic translation from one human language to another using computers, better known as machine tr...
In the past few decades machine translation research has made major progress. A researcher now has a...
This paper investigates optimal ways to get maximal coverage from minimal input training corpus. In ...
We use bilingual lexicon induction techniques, which learn translations from monolin-gual texts in t...
Sentence-aligned bilingual texts are a crucial resource to build statistical machine translation (SM...
We report on findings of exploiting large data sets for translation modeling, language mod-eling and...
Statistical machine translation relies heavily on available parallel corpora, but SMT may not have t...
The amount of training data in statistical machine translation is critical for translation quality. ...
Statistical Machine Translation (SMT) models learn how to translate by examining a bilingual paralle...
Since the eighties, new approaches in machine translation have been explored. These new approaches a...
Abstract. Statistical Machine Translation (SMT) systems are usually trained on large amounts of bili...
We collected a corpus of parallel text in 11 lan-guages from the proceedings of the European Par-lia...
Statistical machine translation systems are usually trained on large amounts of bilingual text and o...
Statistical machine translation systems are usually trained on large amounts of bilingual text and m...
Parallel corpus is an indispensable resource for translation model training in statistical machine t...
Automatic translation from one human language to another using computers, better known as machine tr...
In the past few decades machine translation research has made major progress. A researcher now has a...
This paper investigates optimal ways to get maximal coverage from minimal input training corpus. In ...
We use bilingual lexicon induction techniques, which learn translations from monolin-gual texts in t...