Zero-shot audio captioning aims at automatically generating descriptive textual captions for audio content without prior training for this task. Different from speech recognition which translates audio content that contains spoken language into text, audio captioning is commonly concerned with ambient sounds, or sounds produced by a human performing an action. Inspired by zero-shot image captioning methods, we propose ZerAuCap, a novel framework for summarising such general audio signals in a text caption without requiring task-specific training. In particular, our framework exploits a pre-trained large language model (LLM) for generating the text which is guided by a pre-trained audio-language model to produce captions that describe the au...
We propose an audio captioning system that describes non-speech audio signals in the form of natural...
Submitted to Interspeech 2021. arXiv admin note: text overlap with arXiv:2011.11588International aud...
Automated audio captioning aims to use natural language to describe the content of audio data. This ...
International audienceAutomated audio captioning is the multimodal task of describing environmental ...
Automated Audio Captioning (AAC) is the task of generating natural language descriptions given an au...
Automated audio captioning, a task that mimics human perception as well as innovatively links audio ...
Mainstream Audio Analytics models are trained to learn under the paradigm of one class label to many...
Automated Audio Captioning (AAC) aims to develop systems capable of describing an audio recording us...
In the past, the rapidly evolving field of sound classification greatly benefited from the applicati...
Audio captioning aims at describing acoustic scenes with natural language. Systems are currently eva...
International audienceWe introduce Generative Spoken Language Modeling, the task of learning the aco...
Automated Audio Captioning (AAC) involves generating natural language descriptions of audio content,...
Audio captioning is a novel field of multi-modal translation and it is the task of creating a textua...
In this paper, we study zero-shot learning in audio classification via semantic embeddings extracted...
Automated Audio captioning (AAC) is a cross-modal translation task that aims to use natural language...
We propose an audio captioning system that describes non-speech audio signals in the form of natural...
Submitted to Interspeech 2021. arXiv admin note: text overlap with arXiv:2011.11588International aud...
Automated audio captioning aims to use natural language to describe the content of audio data. This ...
International audienceAutomated audio captioning is the multimodal task of describing environmental ...
Automated Audio Captioning (AAC) is the task of generating natural language descriptions given an au...
Automated audio captioning, a task that mimics human perception as well as innovatively links audio ...
Mainstream Audio Analytics models are trained to learn under the paradigm of one class label to many...
Automated Audio Captioning (AAC) aims to develop systems capable of describing an audio recording us...
In the past, the rapidly evolving field of sound classification greatly benefited from the applicati...
Audio captioning aims at describing acoustic scenes with natural language. Systems are currently eva...
International audienceWe introduce Generative Spoken Language Modeling, the task of learning the aco...
Automated Audio Captioning (AAC) involves generating natural language descriptions of audio content,...
Audio captioning is a novel field of multi-modal translation and it is the task of creating a textua...
In this paper, we study zero-shot learning in audio classification via semantic embeddings extracted...
Automated Audio captioning (AAC) is a cross-modal translation task that aims to use natural language...
We propose an audio captioning system that describes non-speech audio signals in the form of natural...
Submitted to Interspeech 2021. arXiv admin note: text overlap with arXiv:2011.11588International aud...
Automated audio captioning aims to use natural language to describe the content of audio data. This ...