Multiple instance learning (MIL) with convolutional neural networks (CNNs) has been proposed recently for weakly labelled audio tagging. However, features from the various CNN filtering channels and spatial regions are often treated equally, which may limit its performance in event prediction. In this paper, we propose a novel attention mechanism, namely, spatial and channel-wise attention (SCA). For spatial attention, we divide it into global and local submodules with the former to capture the event-related spatial regions and the latter to estimate the onset and offset of the events. Considering the variations in CNN channels, channel-wise attention is also exploited to recognize different sound scenes. The proposed SCA can be employed i...
Joint sound event localization and detection (SELD) is an emerging audio signal processing task addi...
The goal of Acoustic Scene Classification (ASC) is to recognise the environment in which an audio w...
In this technique report, we present a bunch of methods for the task 4 of Detection and Classificati...
Audio tagging is the task of predicting the presence or absence of sound classes within an audio cli...
Weakly labelled audio tagging aims to predict the classes of sound events within an audio clip, wher...
Audio tagging aims to perform multi-label classification on audio chunks and it is a newly proposed ...
In this paper, we present a gated convolutional neural network and a temporal attention-based locali...
Multiple instance learning (MIL) has recently been used for weakly labelled audio tagging, where the...
In this paper, we propose a multi-level attention model for the weakly labelled audio classification...
Environmental audio tagging is a newly proposed task to predict the presence or absence of a specifi...
We propose a convolutional neural network (CNN) model based on an attention pooling method to classi...
This paper proposes to use low-level spatial features extracted from multichannel audio for sound ev...
Sound event detection (SED) is a problem to detect the onset and offset times of sound events in an ...
Sound event detection (SED) is a task to detect sound events in an audio recording. One challenge of...
The advent of mixed reality consumer products brings about a pressing need to develop and improve sp...
Joint sound event localization and detection (SELD) is an emerging audio signal processing task addi...
The goal of Acoustic Scene Classification (ASC) is to recognise the environment in which an audio w...
In this technique report, we present a bunch of methods for the task 4 of Detection and Classificati...
Audio tagging is the task of predicting the presence or absence of sound classes within an audio cli...
Weakly labelled audio tagging aims to predict the classes of sound events within an audio clip, wher...
Audio tagging aims to perform multi-label classification on audio chunks and it is a newly proposed ...
In this paper, we present a gated convolutional neural network and a temporal attention-based locali...
Multiple instance learning (MIL) has recently been used for weakly labelled audio tagging, where the...
In this paper, we propose a multi-level attention model for the weakly labelled audio classification...
Environmental audio tagging is a newly proposed task to predict the presence or absence of a specifi...
We propose a convolutional neural network (CNN) model based on an attention pooling method to classi...
This paper proposes to use low-level spatial features extracted from multichannel audio for sound ev...
Sound event detection (SED) is a problem to detect the onset and offset times of sound events in an ...
Sound event detection (SED) is a task to detect sound events in an audio recording. One challenge of...
The advent of mixed reality consumer products brings about a pressing need to develop and improve sp...
Joint sound event localization and detection (SELD) is an emerging audio signal processing task addi...
The goal of Acoustic Scene Classification (ASC) is to recognise the environment in which an audio w...
In this technique report, we present a bunch of methods for the task 4 of Detection and Classificati...