Manually annotating fine-grained slot-value labels for task-oriented dialogue (ToD) systems is an expensive and time-consuming endeavour. This motivates research into slot-filling methods that operate with limited amounts of labelled data. Moreover, the majority of current work on ToD is based solely on text as the input modality, neglecting the additional challenges of imperfect automatic speech recognition (ASR) when working with spoken language. In this work, we propose a Knowledge-Aware Audio-Grounded generative slot-filling framework, termed KA2G, that focuses on few-shot and zero-shot slot filling for ToD with speech input. KA2G achieves robust and data-efficient slot filling for speech-based ToD by 1) framing it as a text generation ...
Zero-shot cross-domain slot filling aims to transfer knowledge from the labeled source domain to the...
When a natural language generation (NLG) component is implemented in a real-world task-oriented dial...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
Although there have been remarkable advances in dialogue systems through the dialogue systems techno...
Speech representations learned from Self-supervised learning (SSL) models can benefit various speech...
We propose a novel deliberation-based approach to end-to-end (E2E) spoken language understanding (SL...
<p>Spoken dialogue systems typically use predefined semantic slots to parse users' natural language ...
Slot filling is a core operation for utterance understanding in task-oriented dialogue systems. Slot...
Advancements in deep neural networks have allowed automatic speech recognition (ASR) systems to atta...
Knowledge-grounded dialogue systems are challenging to build due to the lack of training data and he...
Performance of spoken language understanding (SLU) can be degraded with automatic speech recognition...
Progress in speech processing has been facilitated by shared datasets and benchmarks. Historically t...
One of the pitfalls in spoken dialogue systems is the brittleness of automatic speech recognition (A...
End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single mo...
Natural language generation from structured data mainly focuses on surface-level descriptions, suffe...
Zero-shot cross-domain slot filling aims to transfer knowledge from the labeled source domain to the...
When a natural language generation (NLG) component is implemented in a real-world task-oriented dial...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
Although there have been remarkable advances in dialogue systems through the dialogue systems techno...
Speech representations learned from Self-supervised learning (SSL) models can benefit various speech...
We propose a novel deliberation-based approach to end-to-end (E2E) spoken language understanding (SL...
<p>Spoken dialogue systems typically use predefined semantic slots to parse users' natural language ...
Slot filling is a core operation for utterance understanding in task-oriented dialogue systems. Slot...
Advancements in deep neural networks have allowed automatic speech recognition (ASR) systems to atta...
Knowledge-grounded dialogue systems are challenging to build due to the lack of training data and he...
Performance of spoken language understanding (SLU) can be degraded with automatic speech recognition...
Progress in speech processing has been facilitated by shared datasets and benchmarks. Historically t...
One of the pitfalls in spoken dialogue systems is the brittleness of automatic speech recognition (A...
End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single mo...
Natural language generation from structured data mainly focuses on surface-level descriptions, suffe...
Zero-shot cross-domain slot filling aims to transfer knowledge from the labeled source domain to the...
When a natural language generation (NLG) component is implemented in a real-world task-oriented dial...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...