Recent advances in End-to-End (E2E) Spoken Language Understanding (SLU) have been primarily due to effective pretraining of speech representations. One such pretraining paradigm is the distillation of semantic knowledge from state-of-the-art text-based models like BERT to speech encoder neural networks. This work is a step towards doing the same in a much more efficient and fine-grained manner where we align speech embeddings and BERT embeddings on a token-by-token basis. We introduce a simple yet novel technique that uses a cross-modal attention mechanism to extract token-level contextual embeddings from a speech encoder such that these can be directly compared and aligned with BERT based contextual embeddings. This alignment is performed ...
Transformer-based models are widely used in natural language understanding (NLU) tasks, and multimod...
Intent classification and slot filling are two core tasks in natural language understanding (NLU). T...
Although speech is a simple and effective way for humans to communicate with the outside world, a mo...
Though achieving impressive results on many NLP tasks, the BERT-like masked language models (MLM) en...
We present Maestro, a self-supervised training method to unify representations learnt from speech an...
We propose PromptBERT, a novel contrastive learning method for learning better sentence representati...
Several recent studies have tested the use of transformer language model representations to infer pr...
Recently, the development of pre-trained language models has brought natural language processing (NL...
In this work, we are dedicated to leveraging the BERT pre-training success and modeling the domain-s...
End-to-end Automatic Speech Recognition (ASR) models are usually trained to optimize the loss of the...
Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural l...
Second-pass rescoring is an important component in automatic speech recognition (ASR) systems that i...
Semantic representation learning for sentences is an important and well-studied problem in NLP. The ...
We present BERT-CTC-Transducer (BECTRA), a novel end-to-end automatic speech recognition (E2E-ASR) m...
Since the advent of automatic evaluation, tasks within Natural Language Processing (NLP), including ...
Transformer-based models are widely used in natural language understanding (NLU) tasks, and multimod...
Intent classification and slot filling are two core tasks in natural language understanding (NLU). T...
Although speech is a simple and effective way for humans to communicate with the outside world, a mo...
Though achieving impressive results on many NLP tasks, the BERT-like masked language models (MLM) en...
We present Maestro, a self-supervised training method to unify representations learnt from speech an...
We propose PromptBERT, a novel contrastive learning method for learning better sentence representati...
Several recent studies have tested the use of transformer language model representations to infer pr...
Recently, the development of pre-trained language models has brought natural language processing (NL...
In this work, we are dedicated to leveraging the BERT pre-training success and modeling the domain-s...
End-to-end Automatic Speech Recognition (ASR) models are usually trained to optimize the loss of the...
Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural l...
Second-pass rescoring is an important component in automatic speech recognition (ASR) systems that i...
Semantic representation learning for sentences is an important and well-studied problem in NLP. The ...
We present BERT-CTC-Transducer (BECTRA), a novel end-to-end automatic speech recognition (E2E-ASR) m...
Since the advent of automatic evaluation, tasks within Natural Language Processing (NLP), including ...
Transformer-based models are widely used in natural language understanding (NLU) tasks, and multimod...
Intent classification and slot filling are two core tasks in natural language understanding (NLU). T...
Although speech is a simple and effective way for humans to communicate with the outside world, a mo...