Spoken language understanding (SLU) is a task aiming to extract high-level semantics from spoken utterances. Previous works have investigated the use of speech self-supervised models and textual pre-trained models, which have shown reasonable improvements to various SLU tasks. However, because of the mismatched modalities between speech signals and text tokens, previous methods usually need complex designs of the frameworks. This work proposes a simple yet efficient unsupervised paradigm that connects speech and textual pre-trained models, resulting in an unsupervised speech-to-semantic pre-trained model for various tasks in SLU. To be specific, we propose to use unsupervised automatic speech recognition (ASR) as a connector that bridges di...
Self-supervised pre-training could effectively improve the performance of low-resource automatic spe...
Over the past few years, self-supervised learned speech representations have emerged as fruitful rep...
Self-supervised learning (SSL) has shown tremendous success in various speech-related downstream tas...
Unsupervised speech recognition has shown great potential to make Automatic Speech Recognition (ASR)...
How to boost speech pre-training with textual data is an unsolved problem due to the fact that speec...
Self-supervised learning (SSL) methods which learn representations of data without explicit supervis...
Spoken language understanding (SLU) tasks involve mapping from speech audio signals to semantic labe...
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech r...
Self-supervised learning (SSL) achieves great success in speech recognition, while limited explorati...
We propose a novel deliberation-based approach to end-to-end (E2E) spoken language understanding (SL...
Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image an...
In Spoken language understanding (SLU), a natural solution is concatenating pre-trained speech model...
Self-supervised speech pre-training empowers the model with the contextual structure inherent in the...
Traditionally, research in automated speech recognition has focused on local-first encoding of audio...
In recent years, speech-based self-supervised learning (SSL) has made significant progress in variou...
Self-supervised pre-training could effectively improve the performance of low-resource automatic spe...
Over the past few years, self-supervised learned speech representations have emerged as fruitful rep...
Self-supervised learning (SSL) has shown tremendous success in various speech-related downstream tas...
Unsupervised speech recognition has shown great potential to make Automatic Speech Recognition (ASR)...
How to boost speech pre-training with textual data is an unsolved problem due to the fact that speec...
Self-supervised learning (SSL) methods which learn representations of data without explicit supervis...
Spoken language understanding (SLU) tasks involve mapping from speech audio signals to semantic labe...
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech r...
Self-supervised learning (SSL) achieves great success in speech recognition, while limited explorati...
We propose a novel deliberation-based approach to end-to-end (E2E) spoken language understanding (SL...
Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image an...
In Spoken language understanding (SLU), a natural solution is concatenating pre-trained speech model...
Self-supervised speech pre-training empowers the model with the contextual structure inherent in the...
Traditionally, research in automated speech recognition has focused on local-first encoding of audio...
In recent years, speech-based self-supervised learning (SSL) has made significant progress in variou...
Self-supervised pre-training could effectively improve the performance of low-resource automatic spe...
Over the past few years, self-supervised learned speech representations have emerged as fruitful rep...
Self-supervised learning (SSL) has shown tremendous success in various speech-related downstream tas...