Building parallel resources for corpus based machine translation, especially Statistical Machine Translation (SMT), from comparable corpora has recently received wide attention in the field Machine Translation research. In this paper, we propose an automatic approach for extraction of parallel fragments from comparable corpora. The comparable corpora are collected from Wikipedia documents and this approach exploits the multilingualism of Wikipedia. The automatic alignment process of parallel text fragments uses a textual entailment technique and Phrase Based SMT (PB-SMT) system. The parallel text fragments extracted thus are used as additional parallel translation examples to complement the training data for a PB-SMT system. The additional ...
Statistical Machine Translation (SMT) relies on the availability of rich parallel corpora. However, ...
AbstractParallel sentences are a relatively scarce but extremely useful resource for many applicatio...
This chapter gives an overview of parallel corpora, i.e. corpora containing source texts in a given ...
Although parallel sentences rarely exist in quasi–comparable corpora, there could be parallel fragme...
This paper proposes a novel method for exploiting comparable documents to generate parallel data fo...
We present a simple and effective method for extracting parallel sentences from comparable corpora. ...
Machine translation (MT), as a high level application of natural language pro-cessing (NLP), is a po...
The development of broad domain statistical machine translation systems is gated by the availability...
In this work we present an approach for extracting parallel phrases from comparable news articles to...
Previously, statistical machine translation (SMT) models have been estimated from parallel corpora, ...
We use existing tools to automatically build two parallel treebanks from existing parallel corpora. ...
We use existing tools to automatically build two parallel treebanks from existing parallel corpora. ...
We use existing tools to automatically build two parallel treebanks from existing parallel corpora. ...
We use existing tools to automatically build two parallel treebanks from existing parallel corpora. ...
This paper discusses the role played by parallel corpora in the design and implementation of fully a...
Statistical Machine Translation (SMT) relies on the availability of rich parallel corpora. However, ...
AbstractParallel sentences are a relatively scarce but extremely useful resource for many applicatio...
This chapter gives an overview of parallel corpora, i.e. corpora containing source texts in a given ...
Although parallel sentences rarely exist in quasi–comparable corpora, there could be parallel fragme...
This paper proposes a novel method for exploiting comparable documents to generate parallel data fo...
We present a simple and effective method for extracting parallel sentences from comparable corpora. ...
Machine translation (MT), as a high level application of natural language pro-cessing (NLP), is a po...
The development of broad domain statistical machine translation systems is gated by the availability...
In this work we present an approach for extracting parallel phrases from comparable news articles to...
Previously, statistical machine translation (SMT) models have been estimated from parallel corpora, ...
We use existing tools to automatically build two parallel treebanks from existing parallel corpora. ...
We use existing tools to automatically build two parallel treebanks from existing parallel corpora. ...
We use existing tools to automatically build two parallel treebanks from existing parallel corpora. ...
We use existing tools to automatically build two parallel treebanks from existing parallel corpora. ...
This paper discusses the role played by parallel corpora in the design and implementation of fully a...
Statistical Machine Translation (SMT) relies on the availability of rich parallel corpora. However, ...
AbstractParallel sentences are a relatively scarce but extremely useful resource for many applicatio...
This chapter gives an overview of parallel corpora, i.e. corpora containing source texts in a given ...