The article presents experiments on mining Wikipedia for extracting SMT useful sentence pairs in three language pairs. Each extracted sentence pair is associated with a cross-lingual lexical similarity score based on which, several evaluations have been conducted to estimate the similarity thresholds which allow the extraction of the most useful data for training three-language pairs SMT systems. The experiments showed that for a similarity score higher than 0.7 all sentence pairs in the three language pairs were fully parallel. However, including in the training sets less parallel sentence pairs (that is with a lower similarity score) showed significant improvements in the translation quality (BLEU-based evaluations). The optimized SMT sys...
We present a novel paradigm for obtaining large amounts of training data for computational linguisti...
Wikipedia articles in different languages have been mined to support various tasks, such as Cross-La...
Previously, statistical machine translation (SMT) models have been estimated from parallel corpora, ...
Multiple approaches to grab comparable data from the Web have been developed up to date. Neverthele...
Multiple approaches to grab comparable data from the Web have been developed up to date. Neverthele...
AbstractParallel sentences are a relatively scarce but extremely useful resource for many applicatio...
We propose the framework of a Machine Translation (MT) bootstrapping method by using multilingual Wi...
Building parallel resources for corpus based machine translation, especially Statistical Machine Tra...
Thesis (Ph.D.)--University of Washington, 2014Machine translation, the computerized translation of o...
Thesis (Ph.D.)--University of Washington, 2014Machine translation, the computerized translation of o...
Statistical Machine Translation (SMT) relies on the availability of rich parallel corpora. However, ...
This paper describes a method for extracting parallel sentences from comparable texts. We present th...
Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Ma...
Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Ma...
Wikipedia articles in different languages have been mined to support various tasks, such as Cross-La...
We present a novel paradigm for obtaining large amounts of training data for computational linguisti...
Wikipedia articles in different languages have been mined to support various tasks, such as Cross-La...
Previously, statistical machine translation (SMT) models have been estimated from parallel corpora, ...
Multiple approaches to grab comparable data from the Web have been developed up to date. Neverthele...
Multiple approaches to grab comparable data from the Web have been developed up to date. Neverthele...
AbstractParallel sentences are a relatively scarce but extremely useful resource for many applicatio...
We propose the framework of a Machine Translation (MT) bootstrapping method by using multilingual Wi...
Building parallel resources for corpus based machine translation, especially Statistical Machine Tra...
Thesis (Ph.D.)--University of Washington, 2014Machine translation, the computerized translation of o...
Thesis (Ph.D.)--University of Washington, 2014Machine translation, the computerized translation of o...
Statistical Machine Translation (SMT) relies on the availability of rich parallel corpora. However, ...
This paper describes a method for extracting parallel sentences from comparable texts. We present th...
Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Ma...
Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Ma...
Wikipedia articles in different languages have been mined to support various tasks, such as Cross-La...
We present a novel paradigm for obtaining large amounts of training data for computational linguisti...
Wikipedia articles in different languages have been mined to support various tasks, such as Cross-La...
Previously, statistical machine translation (SMT) models have been estimated from parallel corpora, ...