Parallel corpora are crucial for training SMT systems. However, for many lan-guage pairs they are available only in very limited quantities. For these lan-guage pairs a huge portion of phrases en-countered at run-time will be unknown. We show how techniques from paraphras-ing can be used to deal with these oth-erwise unknown source language phrases. Our results show that augmenting a state-of-the-art SMT system with paraphrases leads to significantly improved coverage and translation quality. For a training corpus with 10,000 sentence pairs we in-crease the coverage of unique test set un-igrams from 48 % to 90%, with more than half of the newly covered items accurately translated, as opposed to none in current approaches.
For resource-limited language pairs, coverage of the test set by the parallel corpus is an important...
Statistical methods have proven to be very effective when addressing linguistic problems, specially ...
This paper describes FUN-NRC group’s machine translation sys-tems that participated in the NTCIR-10 ...
Statistical Machine Translation (SMT) is the task of automatic translation between two natural langu...
Untranslated words still constitute a ma-jor problem for Statistical Machine Trans-lation (SMT), and...
This paper proposes a novel method that ex-ploits multiple resources to improve statisti-cal machine...
We apply statistical machine translation (SMT) tools to generate novel paraphrases of input sentence...
We apply statistical machine translation (SMT) tools to generate novel paraphrases of input sentence...
The multilingual Paraphrase Database (PPDB) is a freely available automatically created resource of ...
Large amounts of data are essential for training statistical machine translation systems. In this pa...
Most state-of-the-art statistical machine translation systems use log-linear models, which are defin...
We present a method for improving machine translation (MT) evaluation by targeted paraphrasing of r...
Statistical machine translation relies heavily on available parallel corpora, but SMT may not have t...
For resource-limited language pairs, coverage of the test set by the parallel corpus is an important...
Abstract: We present a method for improving machine translation (MT) evaluation by targeted paraphra...
For resource-limited language pairs, coverage of the test set by the parallel corpus is an important...
Statistical methods have proven to be very effective when addressing linguistic problems, specially ...
This paper describes FUN-NRC group’s machine translation sys-tems that participated in the NTCIR-10 ...
Statistical Machine Translation (SMT) is the task of automatic translation between two natural langu...
Untranslated words still constitute a ma-jor problem for Statistical Machine Trans-lation (SMT), and...
This paper proposes a novel method that ex-ploits multiple resources to improve statisti-cal machine...
We apply statistical machine translation (SMT) tools to generate novel paraphrases of input sentence...
We apply statistical machine translation (SMT) tools to generate novel paraphrases of input sentence...
The multilingual Paraphrase Database (PPDB) is a freely available automatically created resource of ...
Large amounts of data are essential for training statistical machine translation systems. In this pa...
Most state-of-the-art statistical machine translation systems use log-linear models, which are defin...
We present a method for improving machine translation (MT) evaluation by targeted paraphrasing of r...
Statistical machine translation relies heavily on available parallel corpora, but SMT may not have t...
For resource-limited language pairs, coverage of the test set by the parallel corpus is an important...
Abstract: We present a method for improving machine translation (MT) evaluation by targeted paraphra...
For resource-limited language pairs, coverage of the test set by the parallel corpus is an important...
Statistical methods have proven to be very effective when addressing linguistic problems, specially ...
This paper describes FUN-NRC group’s machine translation sys-tems that participated in the NTCIR-10 ...