We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of the Libri-Light dataset. More precisely, we carry out noisy student training with SpecAugment using giant Conformer models pre-trained using wav2vec 2.0 pre-training. By doing so, we are able to achieve word-error-rates (WERs) 1.4%/2.6% on the LibriSpeech test/test-other sets against the current state-of-the-art WERs 1.7%/3.3%.Comment: 11 pages, 3 figures, 5 tables. Accepted to NeurIPS SAS 2020 Workshop; v2: minor errors correcte
In recent years, speech-based self-supervised learning (SSL) has made significant progress in variou...
We aim at improving spoken language modeling (LM) using very large amount of automatically transcrib...
In this report, we describe our submitted system for track 2 of the VoxCeleb Speaker Recognition Cha...
We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models ...
Advances in self-supervised learning have significantly reduced the amount of transcribed audio requ...
Self-supervised speech recognition models require considerable labeled training data for learning hi...
Self-supervised pre-training could effectively improve the performance of low-resource automatic spe...
Multilingual speech recognition with supervised learning has achieved great results as reflected in ...
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech r...
Traditionally, research in automated speech recognition has focused on local-first encoding of audio...
Unsupervised speech recognition has shown great potential to make Automatic Speech Recognition (ASR)...
While Automatic Speech Recognition (ASR) models have shown significant advances with the introductio...
Despite recent advancements in deep learning technologies, Child Speech Recognition remains a challe...
Self-supervised learning via masked prediction pre-training (MPPT) has shown impressive performance ...
We investigate the performance of self-supervised pretraining frameworks on pathological speech data...
In recent years, speech-based self-supervised learning (SSL) has made significant progress in variou...
We aim at improving spoken language modeling (LM) using very large amount of automatically transcrib...
In this report, we describe our submitted system for track 2 of the VoxCeleb Speaker Recognition Cha...
We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models ...
Advances in self-supervised learning have significantly reduced the amount of transcribed audio requ...
Self-supervised speech recognition models require considerable labeled training data for learning hi...
Self-supervised pre-training could effectively improve the performance of low-resource automatic spe...
Multilingual speech recognition with supervised learning has achieved great results as reflected in ...
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech r...
Traditionally, research in automated speech recognition has focused on local-first encoding of audio...
Unsupervised speech recognition has shown great potential to make Automatic Speech Recognition (ASR)...
While Automatic Speech Recognition (ASR) models have shown significant advances with the introductio...
Despite recent advancements in deep learning technologies, Child Speech Recognition remains a challe...
Self-supervised learning via masked prediction pre-training (MPPT) has shown impressive performance ...
We investigate the performance of self-supervised pretraining frameworks on pathological speech data...
In recent years, speech-based self-supervised learning (SSL) has made significant progress in variou...
We aim at improving spoken language modeling (LM) using very large amount of automatically transcrib...
In this report, we describe our submitted system for track 2 of the VoxCeleb Speaker Recognition Cha...