Audio tagging aims to detect the types of sound events occurring in an audio recording. To tag the polyphonic audio recordings, we propose to use Connectionist Temporal Classification (CTC) loss function on the top of Convolutional Recurrent Neural Network (CRNN) with learnable Gated Linear Units (GLUCTC), based on a new type of audio label data: Sequentially Labelled Data (SLD). In GLU-CTC, CTC objective function maps the frame-level probability of labels to clip-level probability of labels. To compare the mapping ability of GLU-CTC for sound events, we train a CRNN with GLU based on Global Max Pooling (GLU-GMP) and a CRNN with GLU based on Global Average Pooling (GLU-GAP). And we also compare the proposed GLU-CTC system with the baseline ...
Audio tagging has attracted increasing attention since last decade and has various potential applic...
The objective of this thesis is to investigate how a deep learning model called recurrent neural net...
Estimating the main melody of a polyphonic audio recording remains a challenging task. We approach t...
Audio tagging aims to detect the types of sound events occurring in an audio recording. To tag the p...
In this paper we present our audio tagging system for the DCASE 2019 Challenge Task 2. We propose a ...
Sound events often occur in unstructured environments where they exhibit wide variations in their fr...
In this paper, we present a gated convolutional neural network and a temporal attention-based locali...
Environmental audio tagging is a newly proposed task to predict the presence or absence of a specifi...
General-purpose audio tagging refers to classifying sounds that are of a diverse nature, and is rele...
Sequential audio event tagging can provide not only the type information of audio events, but also t...
Sound event detection (SED) is a task to detect sound events in an audio recording. One challenge of...
The task of speech and music detection aims at the automatic annotation of potentially overlapping s...
To detect the class, and start and end times of sound events in real world recordings is a challengi...
Recently, deep recurrent neural networks have achieved great success in various machine learning tas...
Polyphonic sound event detection (SED) is the task of detecting the time stamps and the class of sou...
Audio tagging has attracted increasing attention since last decade and has various potential applic...
The objective of this thesis is to investigate how a deep learning model called recurrent neural net...
Estimating the main melody of a polyphonic audio recording remains a challenging task. We approach t...
Audio tagging aims to detect the types of sound events occurring in an audio recording. To tag the p...
In this paper we present our audio tagging system for the DCASE 2019 Challenge Task 2. We propose a ...
Sound events often occur in unstructured environments where they exhibit wide variations in their fr...
In this paper, we present a gated convolutional neural network and a temporal attention-based locali...
Environmental audio tagging is a newly proposed task to predict the presence or absence of a specifi...
General-purpose audio tagging refers to classifying sounds that are of a diverse nature, and is rele...
Sequential audio event tagging can provide not only the type information of audio events, but also t...
Sound event detection (SED) is a task to detect sound events in an audio recording. One challenge of...
The task of speech and music detection aims at the automatic annotation of potentially overlapping s...
To detect the class, and start and end times of sound events in real world recordings is a challengi...
Recently, deep recurrent neural networks have achieved great success in various machine learning tas...
Polyphonic sound event detection (SED) is the task of detecting the time stamps and the class of sou...
Audio tagging has attracted increasing attention since last decade and has various potential applic...
The objective of this thesis is to investigate how a deep learning model called recurrent neural net...
Estimating the main melody of a polyphonic audio recording remains a challenging task. We approach t...