[[abstract]]We propose a method that bilingually segments sentences in languages with no clear delimiter for word boundaries. In our model, we first convert the search for the segmentation into a sequential tagging problem, allowing for a polynomial-time dynamic-programming solution, and incorporate a control to balance monolingual and bilingual information at hand. Our bilingual segmentation algorithm, the integration of a monolingual language model and a statistical translation model, is devised to tokenize sentences more suitably for bilingual applications such as word alignment and machine translation. Empirical results show that bilingually-motivated segmenters outperform pure monolingual one in both the word-aligning (12% reduction in...
cfl Springer-Verlag Abstract. Most methods to extract bilingual lexicons from parallel corpora learn...
Abstract. Parallel text alignment is a special type of pattern recognition task aimed to discover th...
In this paper, we propose an algorithm for aligning words with their translation in a bilingual corp...
We introduce a bilingually motivated word segmentation approach to languages where word boundaries a...
We introduce a word segmentation approach to languages where word boundaries are not orthographicall...
International audienceFor endangered languages, data collection campaigns have to accommodate the ch...
We investigate induction of a bilingual lexicon from a corpus of phonemic transcriptions that have b...
For endangered languages, data collection campaigns have to accommodate the challenge that many of t...
One of the basic tasks of computational language documentation (CLD) is to identify word boundaries ...
We introduce a word segmentation ap-proach to languages where word bound-aries are not orthographica...
We present an unsupervised word segmentation model for machine translation. The model uses existing ...
In statistical machine translation, large numbers of parallel sentences are required to train the mo...
Aiming to overcome the shortcomings of word-based Abstract In [his Paper new algorithm called Mulfi-...
In the last decade, while statistical machine translation has advanced significantly, there is still...
In this paper, we present a new word alignment combination approach on language pairs where one lang...
cfl Springer-Verlag Abstract. Most methods to extract bilingual lexicons from parallel corpora learn...
Abstract. Parallel text alignment is a special type of pattern recognition task aimed to discover th...
In this paper, we propose an algorithm for aligning words with their translation in a bilingual corp...
We introduce a bilingually motivated word segmentation approach to languages where word boundaries a...
We introduce a word segmentation approach to languages where word boundaries are not orthographicall...
International audienceFor endangered languages, data collection campaigns have to accommodate the ch...
We investigate induction of a bilingual lexicon from a corpus of phonemic transcriptions that have b...
For endangered languages, data collection campaigns have to accommodate the challenge that many of t...
One of the basic tasks of computational language documentation (CLD) is to identify word boundaries ...
We introduce a word segmentation ap-proach to languages where word bound-aries are not orthographica...
We present an unsupervised word segmentation model for machine translation. The model uses existing ...
In statistical machine translation, large numbers of parallel sentences are required to train the mo...
Aiming to overcome the shortcomings of word-based Abstract In [his Paper new algorithm called Mulfi-...
In the last decade, while statistical machine translation has advanced significantly, there is still...
In this paper, we present a new word alignment combination approach on language pairs where one lang...
cfl Springer-Verlag Abstract. Most methods to extract bilingual lexicons from parallel corpora learn...
Abstract. Parallel text alignment is a special type of pattern recognition task aimed to discover th...
In this paper, we propose an algorithm for aligning words with their translation in a bilingual corp...