This paper proposes a novel method for exploiting comparable documents to generate parallel data for machine translation. First, each source document is paired to each sentence of the corresponding target document; second, partial phrase alignments are computed within the paired texts; finally, fragment pairs across linked phrase-pairs are extracted. The algorithm has been tested on two recent challenging news translation tasks. Results show that mining for parallel fragments is more effective than mining for parallel sentences, and that comparable in-domain texts can be more valuable than parallel out-of-domain texts
In this work we present an approach for extracting parallel phrases from comparable news articles to...
Building a robust MT system requires a sufficiently large parallel corpus to be available as trainin...
Parallel corpora are indispensable resources for a variety of multilingual natural language processi...
AbstractParallel sentences are a relatively scarce but extremely useful resource for many applicatio...
Although parallel sentences rarely exist in quasi–comparable corpora, there could be parallel fragme...
Building parallel resources for corpus based machine translation, especially Statistical Machine Tra...
Achieving accurate translation, especially in multiple domain documents with statistical machine tra...
We present a novel method to detect parallel fragments within noisy parallel corpora. Isolat-ing the...
Parallel text is one of the most valuable resources for development of statistical machine translati...
Abstract. Parallel corpora are playing a crucial role in multilingual natural language processing. U...
The development of broad domain statistical machine translation systems is gated by the availability...
Extracting parallel data from comparable corpora in order to enrich existing statistical translation...
We present a simple and effective method for extracting parallel sentences from comparable corpora. ...
In statistical machine translation, large numbers of parallel sentences are required to train the mo...
In this thesis, we propose a content-based method of mining bilingual parallel documents from websit...
In this work we present an approach for extracting parallel phrases from comparable news articles to...
Building a robust MT system requires a sufficiently large parallel corpus to be available as trainin...
Parallel corpora are indispensable resources for a variety of multilingual natural language processi...
AbstractParallel sentences are a relatively scarce but extremely useful resource for many applicatio...
Although parallel sentences rarely exist in quasi–comparable corpora, there could be parallel fragme...
Building parallel resources for corpus based machine translation, especially Statistical Machine Tra...
Achieving accurate translation, especially in multiple domain documents with statistical machine tra...
We present a novel method to detect parallel fragments within noisy parallel corpora. Isolat-ing the...
Parallel text is one of the most valuable resources for development of statistical machine translati...
Abstract. Parallel corpora are playing a crucial role in multilingual natural language processing. U...
The development of broad domain statistical machine translation systems is gated by the availability...
Extracting parallel data from comparable corpora in order to enrich existing statistical translation...
We present a simple and effective method for extracting parallel sentences from comparable corpora. ...
In statistical machine translation, large numbers of parallel sentences are required to train the mo...
In this thesis, we propose a content-based method of mining bilingual parallel documents from websit...
In this work we present an approach for extracting parallel phrases from comparable news articles to...
Building a robust MT system requires a sufficiently large parallel corpus to be available as trainin...
Parallel corpora are indispensable resources for a variety of multilingual natural language processi...