Alignment of parallel corpora is a crucial step prior to training statistical language models for machine translation. This paper investigates compression-based methods for aligning sentences in an English-Chinese parallel corpus. Four metrics for matching sentences required for measuring the alignment at the sentence level are compared: the standard sentence length ratio (SLR), and three new metrics, absolute sentence length difference (SLD), compression code length ratio (CR), and absolute compression code length difference (CD). Initial experiments with CR show that using the Prediction by Partial Matching (PPM) compression scheme, a method that also performs well at many language modeling tasks, significantly outperforms the other stand...
The parameters of statistical translation models are typically estimated from sentence-aligned paral...
A wide spectrum of multilingual applications have aligned parallel corpora as their prerequisite. Th...
In statistical machine translation, large numbers of parallel sentences are required to train the mo...
Parallel bilingual corpora are important basic resources for statistical machine translation. Accura...
We describe our experience with automatic alignment of sentences inparallel English-Chinese texts. ...
In this paper we describe a statistical tech-nique for aligning sentences with their translations in...
The paper is about developing an aligned English-Myanmar parallel corpus. This paper will describe m...
Word alignment in bilingual or multilingual parallel corpora has been a challenging issue for natura...
Statistically training a machine translation model requires a parallel corpus contain-ing a huge amo...
This paper describes a method for the automatic alignment of parallel texts at clause level. The met...
Bilingual alignment is a crucial problem in the research of natural language processing, and word al...
We explore the usability of different bilingual corpora for the purpose of multilingual and cross-li...
UnrestrictedAll state of the art statistical machine translation systems and many example-based mach...
This paper describes the constructing of a large-scale (above 500,000 pair sentences) Chinese-Englis...
This paper first describes an experiment to construct an English-Chinese parallel corpus, then apply...
The parameters of statistical translation models are typically estimated from sentence-aligned paral...
A wide spectrum of multilingual applications have aligned parallel corpora as their prerequisite. Th...
In statistical machine translation, large numbers of parallel sentences are required to train the mo...
Parallel bilingual corpora are important basic resources for statistical machine translation. Accura...
We describe our experience with automatic alignment of sentences inparallel English-Chinese texts. ...
In this paper we describe a statistical tech-nique for aligning sentences with their translations in...
The paper is about developing an aligned English-Myanmar parallel corpus. This paper will describe m...
Word alignment in bilingual or multilingual parallel corpora has been a challenging issue for natura...
Statistically training a machine translation model requires a parallel corpus contain-ing a huge amo...
This paper describes a method for the automatic alignment of parallel texts at clause level. The met...
Bilingual alignment is a crucial problem in the research of natural language processing, and word al...
We explore the usability of different bilingual corpora for the purpose of multilingual and cross-li...
UnrestrictedAll state of the art statistical machine translation systems and many example-based mach...
This paper describes the constructing of a large-scale (above 500,000 pair sentences) Chinese-Englis...
This paper first describes an experiment to construct an English-Chinese parallel corpus, then apply...
The parameters of statistical translation models are typically estimated from sentence-aligned paral...
A wide spectrum of multilingual applications have aligned parallel corpora as their prerequisite. Th...
In statistical machine translation, large numbers of parallel sentences are required to train the mo...