The bi-encoder structure has been intensively investigated in code-switching (CS) automatic speech recognition (ASR). However, most existing methods require the structures of two monolingual ASR models (MAMs) should be the same and only use the encoder of MAMs. This leads to the problem that pre-trained MAMs cannot be timely and fully used for CS ASR. In this paper, we propose a monolingual recognizers fusion method for CS ASR. It has two stages: the speech awareness (SA) stage and the language fusion (LF) stage. In the SA stage, acoustic features are mapped to two language-specific predictions by two independent MAMs. To keep the MAMs focused on their own language, we further extend the language-aware training strategy for the MAMs. In the...
The phenomenon where a speaker mixes two or more languages within the same conversation is called co...
In this study, we present improvements in N-best rescoring of code-switched speech achieved by n-gra...
Some practical uses of ASR have been implemented, including the transcription of meetings and the us...
Dual-encoder structure successfully utilizes two language-specific encoders (LSEs) for code-switchin...
In this work, we seek to build effective code-switched (CS) automatic speech recognition systems (AS...
Code-switching deals with alternative languages in communication process. Training end-to-end (E2E) ...
ABSTRACT This paper describes the integration of language identification (LID) into a multilingual a...
Training multilingual automatic speech recognition (ASR) systems is challenging because acoustic and...
We propose a) a Language Agnostic end-to-end Speech Translation model (LAST), and b) a data augmenta...
End-to-end formulation of automatic speech recognition (ASR) and speech translation (ST) makes it ea...
Code-switching (CS) in spoken language is where the speech has two or more languages within an utter...
Code-switching (CS) in spoken language is where the speech has two or more languages within an utter...
Recent breakthroughs in automatic speech recognition (ASR) have resulted in a word error rate (WER) ...
Adapting Automatic Speech Recognition (ASR) models to new domains results in a deterioration of perf...
This paper studies a novel pre-training technique with unpaired speech data, Speech2C, for encoder-d...
The phenomenon where a speaker mixes two or more languages within the same conversation is called co...
In this study, we present improvements in N-best rescoring of code-switched speech achieved by n-gra...
Some practical uses of ASR have been implemented, including the transcription of meetings and the us...
Dual-encoder structure successfully utilizes two language-specific encoders (LSEs) for code-switchin...
In this work, we seek to build effective code-switched (CS) automatic speech recognition systems (AS...
Code-switching deals with alternative languages in communication process. Training end-to-end (E2E) ...
ABSTRACT This paper describes the integration of language identification (LID) into a multilingual a...
Training multilingual automatic speech recognition (ASR) systems is challenging because acoustic and...
We propose a) a Language Agnostic end-to-end Speech Translation model (LAST), and b) a data augmenta...
End-to-end formulation of automatic speech recognition (ASR) and speech translation (ST) makes it ea...
Code-switching (CS) in spoken language is where the speech has two or more languages within an utter...
Code-switching (CS) in spoken language is where the speech has two or more languages within an utter...
Recent breakthroughs in automatic speech recognition (ASR) have resulted in a word error rate (WER) ...
Adapting Automatic Speech Recognition (ASR) models to new domains results in a deterioration of perf...
This paper studies a novel pre-training technique with unpaired speech data, Speech2C, for encoder-d...
The phenomenon where a speaker mixes two or more languages within the same conversation is called co...
In this study, we present improvements in N-best rescoring of code-switched speech achieved by n-gra...
Some practical uses of ASR have been implemented, including the transcription of meetings and the us...