Multiple approaches to grab comparable data from the Web have been developed up to date. Nevertheless, coming out with a high-quality comparable corpus of a specific topic is not straightforward. We present a model for the automatic extraction of comparable texts in multiple languages and on specific topics from Wikipedia. In order to prove the value of the model, we automatically extract parallel sentences from the comparable collections and use them to train statistical machine translation engines for specific domains. Our experiments on the English–Spanish pair in the domains of Computer Science, Science, and Sports show that our in-domain translator performs significantly better than a generic one when translating in-domain Wiki...
Building parallel resources for corpus based machine translation, especially Statistical Machine Tra...
Previously, statistical machine translation (SMT) models have been estimated from parallel corpora, ...
Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Ma...
Multiple approaches to grab comparable data from the Web have been developed up to date. Neverthele...
AbstractParallel sentences are a relatively scarce but extremely useful resource for many applicatio...
The article presents experiments on mining Wikipedia for extracting SMT useful sentence pairs in thr...
Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Sele...
AbstractParallel corpora are not available for all domains and languages, but statistical methods in...
Previously, statistical machine translation (SMT) models have been estimated from parallel corpora, ...
This paper describes a method for extracting parallel sentences from comparable texts. We present th...
We propose a language-independent graph-based method to build a-la-carte article collections on user...
We propose a language-independent graph-based method to build a-la-carte article collections on user...
AbstractParallel corpora are not available for all domains and languages, but statistical methods in...
Statistical Machine Translation (SMT) relies on the availability of rich parallel corpora. However, ...
Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Sele...
Building parallel resources for corpus based machine translation, especially Statistical Machine Tra...
Previously, statistical machine translation (SMT) models have been estimated from parallel corpora, ...
Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Ma...
Multiple approaches to grab comparable data from the Web have been developed up to date. Neverthele...
AbstractParallel sentences are a relatively scarce but extremely useful resource for many applicatio...
The article presents experiments on mining Wikipedia for extracting SMT useful sentence pairs in thr...
Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Sele...
AbstractParallel corpora are not available for all domains and languages, but statistical methods in...
Previously, statistical machine translation (SMT) models have been estimated from parallel corpora, ...
This paper describes a method for extracting parallel sentences from comparable texts. We present th...
We propose a language-independent graph-based method to build a-la-carte article collections on user...
We propose a language-independent graph-based method to build a-la-carte article collections on user...
AbstractParallel corpora are not available for all domains and languages, but statistical methods in...
Statistical Machine Translation (SMT) relies on the availability of rich parallel corpora. However, ...
Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Sele...
Building parallel resources for corpus based machine translation, especially Statistical Machine Tra...
Previously, statistical machine translation (SMT) models have been estimated from parallel corpora, ...
Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Ma...