International audienceFor endangered languages, data collection campaigns have to accommodate the challenge that many of them are from oral tradition, and producing transcriptions is costly. Therefore, it is fundamental to translate them into a widely spoken language to ensure interpretability of the recordings. In this paper we investigate how the choice of translation language affects the posterior documentation work and potential automatic approaches which will work on top of the produced bilingual corpus. For answering this question, we use the MaSS multilingual speech corpus (Boito et al., 2020) for creating 56 bilingual pairs that we apply to the task of low-resource unsupervised word segmentation and alignment. Our results highlight ...
Current methods for word alignment require considerable amounts of parallel text to deliver accurate...
In statistical machine translation, large numbers of parallel sentences are required to train the ...
International audienceAttention-based sequence-to-sequence neural machine translation systems have b...
For endangered languages, data collection campaigns have to accommodate the challenge that many of t...
We investigate induction of a bilingual lexicon from a corpus of phonemic transcriptions that have b...
International audienceOne of the basic tasks of computational language documentation (CLD) is to ide...
[[abstract]]We propose a method that bilingually segments sentences in languages with no clear delim...
Data de publicació electrònica: 3 de novembre de 2021Since there are no systematic pauses delimiting...
This paper introduces some key aspects of machine translation in order to situate the role of the bi...
cfl Springer-Verlag Abstract. Most methods to extract bilingual lexicons from parallel corpora learn...
Abstract. This paper introduces some key aspects of machine translation in order to situate the role...
International audienceThe attention mechanism in Neural Machine Translation (NMT) models added flexi...
In this paper, we propose an algorithm for aligning words with their translation in a bilingual corp...
Statistically training a machine translation model requires a parallel corpus contain-ing a huge amo...
We introduce a bilingually motivated word segmentation approach to languages where word boundaries a...
Current methods for word alignment require considerable amounts of parallel text to deliver accurate...
In statistical machine translation, large numbers of parallel sentences are required to train the ...
International audienceAttention-based sequence-to-sequence neural machine translation systems have b...
For endangered languages, data collection campaigns have to accommodate the challenge that many of t...
We investigate induction of a bilingual lexicon from a corpus of phonemic transcriptions that have b...
International audienceOne of the basic tasks of computational language documentation (CLD) is to ide...
[[abstract]]We propose a method that bilingually segments sentences in languages with no clear delim...
Data de publicació electrònica: 3 de novembre de 2021Since there are no systematic pauses delimiting...
This paper introduces some key aspects of machine translation in order to situate the role of the bi...
cfl Springer-Verlag Abstract. Most methods to extract bilingual lexicons from parallel corpora learn...
Abstract. This paper introduces some key aspects of machine translation in order to situate the role...
International audienceThe attention mechanism in Neural Machine Translation (NMT) models added flexi...
In this paper, we propose an algorithm for aligning words with their translation in a bilingual corp...
Statistically training a machine translation model requires a parallel corpus contain-ing a huge amo...
We introduce a bilingually motivated word segmentation approach to languages where word boundaries a...
Current methods for word alignment require considerable amounts of parallel text to deliver accurate...
In statistical machine translation, large numbers of parallel sentences are required to train the ...
International audienceAttention-based sequence-to-sequence neural machine translation systems have b...