Almost none of the 2,000+ languages spoken in Africa have widely available automatic speech recognition systems, and the required data is also only available for a few languages. We have experimented with two techniques which may provide pathways to large vocabulary speech recognition for African languages: multilingual modeling and self-supervised learning. We gathered available open source data and collected data for 15 languages, and trained experimental models using these techniques. Our results show that pooling the small amounts of data available in multilingual end-to-end models, and pre-training on unsupervised data can help improve speech recognition quality for many African languages
South Africa has eleven official languages, ten of which are considered “resource-scarce”. For these...
Development of fully featured Automatic Speech Recognition (ASR) systems for a complete language voc...
In this study, we present improvements in N-best rescoring of code-switched speech achieved by n-gra...
International audienceThis article presents the data collected and ASR systems developped for 4 sub-...
There are over 7000 languages spoken on earth, but many of these languages suffer from a dearth of n...
Language models are the foundation of current neural network-based models for natural language under...
At present, Siri, Dragon Dictate, Google Voice, and Alexa-like functionalities are not available in ...
For many of the 700 million illiterate people around the world, speech recognition technology could ...
While building automatic speech recognition (ASR) requires a large amount of speech and text data, t...
While building automatic speech recognition (ASR) requires a large amount of speech and text data, t...
We investigate the impact of recent advances in speech recognition techniques for under-resourced l...
Under-resourced speech recognizers may benefit from data in languages other than the target language...
Keyword spotting refers to the task of learning to detect spoken keywords. It interfaces all modern ...
Keyword spotting refers to the task of learning to detect spoken keywords. It interfaces all modern ...
Development of fully featured Automatic Speech Recognition (ASR) systems for a complete language voc...
South Africa has eleven official languages, ten of which are considered “resource-scarce”. For these...
Development of fully featured Automatic Speech Recognition (ASR) systems for a complete language voc...
In this study, we present improvements in N-best rescoring of code-switched speech achieved by n-gra...
International audienceThis article presents the data collected and ASR systems developped for 4 sub-...
There are over 7000 languages spoken on earth, but many of these languages suffer from a dearth of n...
Language models are the foundation of current neural network-based models for natural language under...
At present, Siri, Dragon Dictate, Google Voice, and Alexa-like functionalities are not available in ...
For many of the 700 million illiterate people around the world, speech recognition technology could ...
While building automatic speech recognition (ASR) requires a large amount of speech and text data, t...
While building automatic speech recognition (ASR) requires a large amount of speech and text data, t...
We investigate the impact of recent advances in speech recognition techniques for under-resourced l...
Under-resourced speech recognizers may benefit from data in languages other than the target language...
Keyword spotting refers to the task of learning to detect spoken keywords. It interfaces all modern ...
Keyword spotting refers to the task of learning to detect spoken keywords. It interfaces all modern ...
Development of fully featured Automatic Speech Recognition (ASR) systems for a complete language voc...
South Africa has eleven official languages, ten of which are considered “resource-scarce”. For these...
Development of fully featured Automatic Speech Recognition (ASR) systems for a complete language voc...
In this study, we present improvements in N-best rescoring of code-switched speech achieved by n-gra...