Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled speech data to build robust end-to-end (E2E) acoustic models (AM) that can be later fine-tuned on downstream tasks e.g., automatic speech recognition (ASR). Yet, few works investigated the impact on performance when the data properties substantially differ between the pre-training and fine-tuning phases, termed domain shift. We target this scenario by analyzing the robustness of Wav2Vec 2.0 and XLS-R models on downstream ASR for a completely unseen domain, air traffic control (ATC) communications. We benchmark these two models on several open-source and challenging ATC databases with signal-to-noise ratio between 5 to 20 dB. Relative word error ra...
Automatic Speech Recognition (ASR) can introduce higher levels of automation into Air Traffic Contro...
The Automated Speech Recognition (ASR) community experiences a major turning point with the rise of ...
The performance of the speech recognition systems to translate voice to text is still an issue in la...
Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled speech data t...
Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled speech data to...
Wav2vec2.0 is a popular self-supervised pre-training framework for learning speech representations i...
Self-supervised pre-training could effectively improve the performance of low-resource automatic spe...
Funding Information: We are grateful for the Academy of Finland project funding, numbers: 337073, 34...
With timeliness and efficiency being critical in the aviation maintenance industry, the need has bee...
Automatic Speech Recognition (ASR) has recently proved to be a useful tool to reduce the workload of...
In automatic speech recognition systems (ASRs), training is a critical phase to the system?s success...
Nowadays, recognizing and understanding human speech is quite popular through systems like Alexa®, t...
Unsupervised speech recognition has shown great potential to make Automatic Speech Recognition (ASR)...
Automatic Speech Recognition (ASR) can introduce higher levels of automation into Air Traffic Contr...
Despite recent advancements in deep learning technologies, Child Speech Recognition remains a challe...
Automatic Speech Recognition (ASR) can introduce higher levels of automation into Air Traffic Contro...
The Automated Speech Recognition (ASR) community experiences a major turning point with the rise of ...
The performance of the speech recognition systems to translate voice to text is still an issue in la...
Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled speech data t...
Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled speech data to...
Wav2vec2.0 is a popular self-supervised pre-training framework for learning speech representations i...
Self-supervised pre-training could effectively improve the performance of low-resource automatic spe...
Funding Information: We are grateful for the Academy of Finland project funding, numbers: 337073, 34...
With timeliness and efficiency being critical in the aviation maintenance industry, the need has bee...
Automatic Speech Recognition (ASR) has recently proved to be a useful tool to reduce the workload of...
In automatic speech recognition systems (ASRs), training is a critical phase to the system?s success...
Nowadays, recognizing and understanding human speech is quite popular through systems like Alexa®, t...
Unsupervised speech recognition has shown great potential to make Automatic Speech Recognition (ASR)...
Automatic Speech Recognition (ASR) can introduce higher levels of automation into Air Traffic Contr...
Despite recent advancements in deep learning technologies, Child Speech Recognition remains a challe...
Automatic Speech Recognition (ASR) can introduce higher levels of automation into Air Traffic Contro...
The Automated Speech Recognition (ASR) community experiences a major turning point with the rise of ...
The performance of the speech recognition systems to translate voice to text is still an issue in la...