We present SpeechMatrix, a large-scale multilingual corpus of speech-to-speech translations mined from real speech of European Parliament recordings. It contains speech alignments in 136 language pairs with a total of 418 thousand hours of speech. To evaluate the quality of this parallel speech, we train bilingual speech-to-speech translation models on mined data only and establish extensive baseline results on EuroParl-ST, VoxPopuli and FLEURS test sets. Enabled by the multilinguality of SpeechMatrix, we also explore multilingual speech-to-speech translation, a topic which was addressed by few other works. We also demonstrate that model pre-training and sparse scaling using Mixture-of-Experts bring large gains to translation performance. T...
This work investigates the use of large-scale, pre-trained models (CLIP and HuBERT) for multilingual...
Recently, we have seen an increasing interest in the area of speech-to-text translation. This has le...
In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which ...
International audienceWe present SpeechMatrix, a large-scale multilingual corpus of speech-to-speech...
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech r...
Training speech translation (ST) models requires large and high-quality datasets. MuST-C is one of t...
We introduce CVSS, a massively multilingual-to-English speech-to-speech translation (S2ST) corpus, c...
End-to-end spoken language translation (SLT) has recently gained popularity thanks to the advancemen...
International audienceRecent works in spoken language translation (SLT) have attempted to build end-...
In this paper, we introduce a massively multilingual speech corpora with fine-grained phonemic trans...
Recently, end-to-end speech translation (ST) has gained significant attention as it avoids error pro...
This paper proposes Virtuoso, a massively multilingual speech-text joint semi-supervised learning fr...
The primary goal of this FBK's systems submission to the IWSLT 2022 offline and simultaneous speech ...
International audienceThe CMU Wilderness Multilingual Speech Dataset (Black, 2019) is a newly publis...
Transformer models using segment-based processing have been an effective architecture for simultaneo...
This work investigates the use of large-scale, pre-trained models (CLIP and HuBERT) for multilingual...
Recently, we have seen an increasing interest in the area of speech-to-text translation. This has le...
In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which ...
International audienceWe present SpeechMatrix, a large-scale multilingual corpus of speech-to-speech...
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech r...
Training speech translation (ST) models requires large and high-quality datasets. MuST-C is one of t...
We introduce CVSS, a massively multilingual-to-English speech-to-speech translation (S2ST) corpus, c...
End-to-end spoken language translation (SLT) has recently gained popularity thanks to the advancemen...
International audienceRecent works in spoken language translation (SLT) have attempted to build end-...
In this paper, we introduce a massively multilingual speech corpora with fine-grained phonemic trans...
Recently, end-to-end speech translation (ST) has gained significant attention as it avoids error pro...
This paper proposes Virtuoso, a massively multilingual speech-text joint semi-supervised learning fr...
The primary goal of this FBK's systems submission to the IWSLT 2022 offline and simultaneous speech ...
International audienceThe CMU Wilderness Multilingual Speech Dataset (Black, 2019) is a newly publis...
Transformer models using segment-based processing have been an effective architecture for simultaneo...
This work investigates the use of large-scale, pre-trained models (CLIP and HuBERT) for multilingual...
Recently, we have seen an increasing interest in the area of speech-to-text translation. This has le...
In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which ...