This paper describes the joint effort of Brno University of Technology (BUT), AGH University of Krakow and University of Buenos Aires on the development of Automatic Speech Recognition systems for the CHiME-7 Challenge. We train and evaluate various end-to-end models with several toolkits. We heavily relied on Guided Source Separation (GSS) to convert multi-channel audio to single channel. The ASR is leveraging speech representations from models pre-trained by self-supervised learning, and we do a fusion of several ASR systems. In addition, we modified external data from the LibriSpeech corpus to become a close domain and added it to the training. Our efforts were focused on the far-field acoustic robustness sub-track of Task 1 - Distant Au...
International audienceSpeech enhancement and automatic speech recognition (ASR) are most often evalu...
While Automatic Speech Recognition (ASR) models have shown significant advances with the introductio...
International audienceSupervised speech enhancement models are trained using artificially generated ...
International audienceDistant-microphone automatic speech recognition (ASR) remains a challenging go...
International audienceThe CHiME challenge series aims to advance far field speech recognition techno...
International audienceDistant-microphone automatic speech recognition (ASR) remains a challenging go...
International audienceThe CHiME challenge series aims to advance robust automatic speech recognition...
International audienceThis paper presents the design and outcomes of the CHiME-3 challenge, the firs...
This paper presents the design and outcomes of the CHiME-3 challenge, the first open speech recognit...
International audienceDistant microphone speech recognition systems that operate with humanlike robu...
We present the NVIDIA NeMo team's multi-channel speech recognition system for the 7th CHiME Challeng...
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech r...
We explore cross-lingual multi-speaker speech synthesis and cross-lingual voice conversion applied t...
Submitted to ICASSP 2020International audienceWe consider the problem of robust automatic speech rec...
Traditionally, research in automated speech recognition has focused on local-first encoding of audio...
International audienceSpeech enhancement and automatic speech recognition (ASR) are most often evalu...
While Automatic Speech Recognition (ASR) models have shown significant advances with the introductio...
International audienceSupervised speech enhancement models are trained using artificially generated ...
International audienceDistant-microphone automatic speech recognition (ASR) remains a challenging go...
International audienceThe CHiME challenge series aims to advance far field speech recognition techno...
International audienceDistant-microphone automatic speech recognition (ASR) remains a challenging go...
International audienceThe CHiME challenge series aims to advance robust automatic speech recognition...
International audienceThis paper presents the design and outcomes of the CHiME-3 challenge, the firs...
This paper presents the design and outcomes of the CHiME-3 challenge, the first open speech recognit...
International audienceDistant microphone speech recognition systems that operate with humanlike robu...
We present the NVIDIA NeMo team's multi-channel speech recognition system for the 7th CHiME Challeng...
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech r...
We explore cross-lingual multi-speaker speech synthesis and cross-lingual voice conversion applied t...
Submitted to ICASSP 2020International audienceWe consider the problem of robust automatic speech rec...
Traditionally, research in automated speech recognition has focused on local-first encoding of audio...
International audienceSpeech enhancement and automatic speech recognition (ASR) are most often evalu...
While Automatic Speech Recognition (ASR) models have shown significant advances with the introductio...
International audienceSupervised speech enhancement models are trained using artificially generated ...