We propose a novel deliberation-based approach to end-to-end (E2E) spoken language understanding (SLU), where a streaming automatic speech recognition (ASR) model produces the first-pass hypothesis and a second-pass natural language understanding (NLU) component generates the semantic parse by conditioning on both ASR's text and audio embeddings. By formulating E2E SLU as a generalized decoder, our system is able to support complex compositional semantic structures. Furthermore, the sharing of parameters between ASR and NLU makes the system especially suitable for resource-constrained (on-device) environments; our proposed approach consistently outperforms strong pipeline NLU baselines by 0.60% to 0.65% on the spoken version of the TOPv2 da...
Spoken language understanding (SLU) topic has seen a lot of progress these last three years, with th...
Spoken language understanding (SLU) tasks involve mapping from speech audio signals to semantic labe...
Text-only adaptation of an end-to-end (E2E) model remains a challenging task for automatic speech re...
Automatic speech recognition (ASR) systems typically rely on an external endpointer (EP) model to id...
End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single mo...
Spoken Language Understanding (SLU) is a core task in most human-machine interaction systems . With ...
International audienceSpoken Language Understanding (SLU) is a core task in most human-machine inter...
International audienceSpoken Language Understanding (SLU) is typically performedthrough automatic sp...
Voice Assistants such as Alexa, Siri, and Google Assistant typically use a two-stage Spoken Language...
Advancements in deep neural networks have allowed automatic speech recognition (ASR) systems to atta...
Spoken language understanding (SLU) is a task aiming to extract high-level semantics from spoken utt...
Spoken Language Understanding (SLU) is typically performed through automatic speech recognition (ASR...
International audienceThis work deals with spoken language understanding (SLU) systems in the scenar...
End-to-end formulation of automatic speech recognition (ASR) and speech translation (ST) makes it ea...
Although recent advances in deep learning technology have boosted automatic speech recognition (ASR)...
Spoken language understanding (SLU) topic has seen a lot of progress these last three years, with th...
Spoken language understanding (SLU) tasks involve mapping from speech audio signals to semantic labe...
Text-only adaptation of an end-to-end (E2E) model remains a challenging task for automatic speech re...
Automatic speech recognition (ASR) systems typically rely on an external endpointer (EP) model to id...
End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single mo...
Spoken Language Understanding (SLU) is a core task in most human-machine interaction systems . With ...
International audienceSpoken Language Understanding (SLU) is a core task in most human-machine inter...
International audienceSpoken Language Understanding (SLU) is typically performedthrough automatic sp...
Voice Assistants such as Alexa, Siri, and Google Assistant typically use a two-stage Spoken Language...
Advancements in deep neural networks have allowed automatic speech recognition (ASR) systems to atta...
Spoken language understanding (SLU) is a task aiming to extract high-level semantics from spoken utt...
Spoken Language Understanding (SLU) is typically performed through automatic speech recognition (ASR...
International audienceThis work deals with spoken language understanding (SLU) systems in the scenar...
End-to-end formulation of automatic speech recognition (ASR) and speech translation (ST) makes it ea...
Although recent advances in deep learning technology have boosted automatic speech recognition (ASR)...
Spoken language understanding (SLU) topic has seen a lot of progress these last three years, with th...
Spoken language understanding (SLU) tasks involve mapping from speech audio signals to semantic labe...
Text-only adaptation of an end-to-end (E2E) model remains a challenging task for automatic speech re...