The idea of combining multiple languages’ recordings to train a single automatic speech recognition (ASR) model brings the promise of the emergence of universal speech representation. Recently, a Transformer encoder-decoder model has been shown to leverage multilingual data well in IPA transcriptions of languages presented during training. However, the representations it learned were not successful in zero-shot transfer to unseen languages. Because that model lacks an explicit factorization of the acoustic model (AM) and language model (LM), it is unclear to what degree the performance suffered from differences in pronunciation or the mismatch in phono-tactics. To gain more insight into the factors limiting zero-shot ASR transfer, we replac...
Spoken language translation (SLT) exists within one of the most challenging intersections of speech ...
Recently, end-to-end speech translation (ST) has gained significant attention as it avoids error pro...
Training multilingual automatic speech recognition (ASR) systems is challenging because acoustic and...
Only a handful of the world’s languages are abundant with the resources that enable practical applic...
Recent breakthroughs in automatic speech recognition (ASR) have resulted in a word error rate (WER) ...
The recent development of neural network-based automatic speech recognition (ASR) systems has greatl...
Multilingual automatic speech recognition (ASR) systems mostly benefit low resource languages but su...
Despite recent advances in automatic speech recognition (ASR), the recognition of children’s speech ...
Rapid deployment of automatic speech recognition (ASR) in new languages, with very limited data, is ...
Exploiting cross-lingual resources is an effective way to compensate for data scarcity of low resour...
We present a method for cross-lingual training an ASR system using absolutely no transcribed trainin...
Over the past decades, speech recognition has dramatically improved in a large variety of applicatio...
A state-of-the-art automatic speech recognition (ASR) system can often achieve high accuracy for mos...
Code-switching (CS) in spoken language is where the speech has two or more languages within an utter...
Automatic phonemic transcription tools are useful for low-resource language documentation. However, ...
Spoken language translation (SLT) exists within one of the most challenging intersections of speech ...
Recently, end-to-end speech translation (ST) has gained significant attention as it avoids error pro...
Training multilingual automatic speech recognition (ASR) systems is challenging because acoustic and...
Only a handful of the world’s languages are abundant with the resources that enable practical applic...
Recent breakthroughs in automatic speech recognition (ASR) have resulted in a word error rate (WER) ...
The recent development of neural network-based automatic speech recognition (ASR) systems has greatl...
Multilingual automatic speech recognition (ASR) systems mostly benefit low resource languages but su...
Despite recent advances in automatic speech recognition (ASR), the recognition of children’s speech ...
Rapid deployment of automatic speech recognition (ASR) in new languages, with very limited data, is ...
Exploiting cross-lingual resources is an effective way to compensate for data scarcity of low resour...
We present a method for cross-lingual training an ASR system using absolutely no transcribed trainin...
Over the past decades, speech recognition has dramatically improved in a large variety of applicatio...
A state-of-the-art automatic speech recognition (ASR) system can often achieve high accuracy for mos...
Code-switching (CS) in spoken language is where the speech has two or more languages within an utter...
Automatic phonemic transcription tools are useful for low-resource language documentation. However, ...
Spoken language translation (SLT) exists within one of the most challenging intersections of speech ...
Recently, end-to-end speech translation (ST) has gained significant attention as it avoids error pro...
Training multilingual automatic speech recognition (ASR) systems is challenging because acoustic and...