In this paper, we introduce a massively multilingual speech corpora with fine-grained phonemic transcriptions, encompassing more than 115 languages from diverse language families. Based on this multilingual dataset, we propose CLAP-IPA, a multilingual phoneme-speech contrastive embedding model capable of open-vocabulary matching between speech signals and phonemically transcribed keywords or arbitrary phrases. The proposed model has been tested on two fieldwork speech corpora in 97 unseen languages, exhibiting strong generalizability across languages. Comparison with a text-based model shows that using phonemes as modeling units enables much better crosslinguistic generalization than orthographic texts.Comment: Preprint; Work in Progres
Copyright © 2014 ISCA. Developing high-performance speech processing systems for low-resource langua...
Rapid deployment of automatic speech recognition (ASR) in new languages, with very limited data, is ...
While pretrained language models (PLMs) primarily serve as general purpose text encoders that can be...
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech r...
This paper presents a state-of-the-art model for transcribing speech in any language into the Intern...
This work investigates the use of large-scale, pre-trained models (CLIP and HuBERT) for multilingual...
We present a method for cross-lingual training an ASR system using absolutely no transcribed trainin...
In this paper, we study the disentanglement of speaker and language representations in non-autoregre...
Pretrained multilingual language models have become a common tool in transferring NLP capabilities t...
We present SpeechMatrix, a large-scale multilingual corpus of speech-to-speech translations mined fr...
Only a handful of the world’s languages are abundant with the resources that enable practical applic...
Most state-of-the-art spoken language identification models are closed-set; in other words, they can...
Previous cross-lingual transfer methods are restricted to orthographic representation learning via t...
Current speech recognition systems tend to be developed only for commercially viable languages. The ...
Bilingual Word Embeddings (BWEs) are one of the cornerstones of cross-lingual transfer of NLP models...
Copyright © 2014 ISCA. Developing high-performance speech processing systems for low-resource langua...
Rapid deployment of automatic speech recognition (ASR) in new languages, with very limited data, is ...
While pretrained language models (PLMs) primarily serve as general purpose text encoders that can be...
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech r...
This paper presents a state-of-the-art model for transcribing speech in any language into the Intern...
This work investigates the use of large-scale, pre-trained models (CLIP and HuBERT) for multilingual...
We present a method for cross-lingual training an ASR system using absolutely no transcribed trainin...
In this paper, we study the disentanglement of speaker and language representations in non-autoregre...
Pretrained multilingual language models have become a common tool in transferring NLP capabilities t...
We present SpeechMatrix, a large-scale multilingual corpus of speech-to-speech translations mined fr...
Only a handful of the world’s languages are abundant with the resources that enable practical applic...
Most state-of-the-art spoken language identification models are closed-set; in other words, they can...
Previous cross-lingual transfer methods are restricted to orthographic representation learning via t...
Current speech recognition systems tend to be developed only for commercially viable languages. The ...
Bilingual Word Embeddings (BWEs) are one of the cornerstones of cross-lingual transfer of NLP models...
Copyright © 2014 ISCA. Developing high-performance speech processing systems for low-resource langua...
Rapid deployment of automatic speech recognition (ASR) in new languages, with very limited data, is ...
While pretrained language models (PLMs) primarily serve as general purpose text encoders that can be...