International audienceThe CMU Wilderness Multilingual Speech Dataset (Black, 2019) is a newly published multilingual speech dataset based on recorded readings of the New Testament. It provides data to build Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) models for potentially 700 languages. However, the fact that the source content (the Bible) is the same for all the languages is not exploited to date. Therefore, this article proposes to add multilingual links between speech segments in different languages, and shares a large and clean dataset of 8,130 parallel spoken utterances across 8 languages (56 language pairs). We name this corpus MaSS (Multilingual corpus of Sentence-aligned Spoken utterances). The covered languages (Ba...
This dataset contains the first electronic speech corpus of Maaloula Aramaic, an endangered Western ...
This dataset is a corpus of sentence-aligned triples of German audio, German text, and English trans...
Statistically training a machine translation model requires a parallel corpus contain-ing a huge amo...
Abstract The CMU Wilderness Multilingual Speech Dataset is a newly published multilingual speech da...
International audienceRecent works in spoken language translation (SLT) have attempted to build end-...
BibleTTS is a large, high-quality, open speech dataset for ten languages spoken in Sub-Saharan Afric...
International audienceWe present SpeechMatrix, a large-scale multilingual corpus of speech-to-speech...
We present SpeechMatrix, a large-scale multilingual corpus of speech-to-speech translations mined fr...
To support machine learning of cross-language prosodic mappings and other ways to improve speech-to-...
The research described in the present article is an extensive study of the European Parliament Inter...
The research described in the present article is an extensive study of the European Parliament Inter...
End-to-end spoken language translation (SLT) has recently gained popularity thanks to the advancemen...
The recent uprise of end-to-end speech translation models requires a new generation of parallel corp...
The recent uprise of end-to-end speech translation models requires a new generation of parallel corp...
International audienceThis article presents multimodal and parallel data collections in Mboshi, as p...
This dataset contains the first electronic speech corpus of Maaloula Aramaic, an endangered Western ...
This dataset is a corpus of sentence-aligned triples of German audio, German text, and English trans...
Statistically training a machine translation model requires a parallel corpus contain-ing a huge amo...
Abstract The CMU Wilderness Multilingual Speech Dataset is a newly published multilingual speech da...
International audienceRecent works in spoken language translation (SLT) have attempted to build end-...
BibleTTS is a large, high-quality, open speech dataset for ten languages spoken in Sub-Saharan Afric...
International audienceWe present SpeechMatrix, a large-scale multilingual corpus of speech-to-speech...
We present SpeechMatrix, a large-scale multilingual corpus of speech-to-speech translations mined fr...
To support machine learning of cross-language prosodic mappings and other ways to improve speech-to-...
The research described in the present article is an extensive study of the European Parliament Inter...
The research described in the present article is an extensive study of the European Parliament Inter...
End-to-end spoken language translation (SLT) has recently gained popularity thanks to the advancemen...
The recent uprise of end-to-end speech translation models requires a new generation of parallel corp...
The recent uprise of end-to-end speech translation models requires a new generation of parallel corp...
International audienceThis article presents multimodal and parallel data collections in Mboshi, as p...
This dataset contains the first electronic speech corpus of Maaloula Aramaic, an endangered Western ...
This dataset is a corpus of sentence-aligned triples of German audio, German text, and English trans...
Statistically training a machine translation model requires a parallel corpus contain-ing a huge amo...