The success of supervised deep learning methods is largely due to their ability to learn relevant features from raw data. Deep Neural Networks (DNNs) trained on large-scale datasets are capable of capturing a diverse set of features, and learning a representation that can generalize onto unseen tasks and datasets that are from the same domain. Hence, these models can be used as powerful feature extractors, in combination with shallower models as classifiers, for smaller tasks and datasets where the amount of training data is insufficient for learning an end-to-end model from scratch. During the past years, Convolutional Neural Networks (CNNs) have largely been the method of choice for audio processing. However, recently attention-based tran...
This work presents a text-to-audio-retrieval system based on pre-trained text and spectrogram transf...
Deep learning has fueled an explosion of applications, yet training deep neural networks usually req...
Audio classification plays a crucial role in speech and sound processing tasks with a wide range of ...
Audio Spectrogram Transformer models rule the field of Audio Tagging, outrunning previously dominati...
The great success of transformer-based models in natural language processing (NLP) has led to variou...
Methods for extracting audio and speech features have been studied since pioneering work on spectrum...
Deep neural networks have been recently shown to capture intricate information transformation of sig...
Convolutional Neural Networks (CNN) have been applied to diverse machine learning tasks for differen...
While deep neural networks have shown impressive results in automatic speaker recognition and relate...
The main objective of this work is to investigate how a deep convolutional neural network (CNN) perf...
The main objective of this work is to investigate how a deep convolutional neural network (CNN) perf...
We study the usability of pre-trained weakly supervised audio tagging (AT) models as feature extract...
We present Multiscale Audio Spectrogram Transformer (MAST) for audio classification, which brings th...
In recent years deep neural networks (DNNs) have become a popular choice for audio content analysis....
We propose a neural audio generative model, MDCTNet, operating in the perceptually weighted domain o...
This work presents a text-to-audio-retrieval system based on pre-trained text and spectrogram transf...
Deep learning has fueled an explosion of applications, yet training deep neural networks usually req...
Audio classification plays a crucial role in speech and sound processing tasks with a wide range of ...
Audio Spectrogram Transformer models rule the field of Audio Tagging, outrunning previously dominati...
The great success of transformer-based models in natural language processing (NLP) has led to variou...
Methods for extracting audio and speech features have been studied since pioneering work on spectrum...
Deep neural networks have been recently shown to capture intricate information transformation of sig...
Convolutional Neural Networks (CNN) have been applied to diverse machine learning tasks for differen...
While deep neural networks have shown impressive results in automatic speaker recognition and relate...
The main objective of this work is to investigate how a deep convolutional neural network (CNN) perf...
The main objective of this work is to investigate how a deep convolutional neural network (CNN) perf...
We study the usability of pre-trained weakly supervised audio tagging (AT) models as feature extract...
We present Multiscale Audio Spectrogram Transformer (MAST) for audio classification, which brings th...
In recent years deep neural networks (DNNs) have become a popular choice for audio content analysis....
We propose a neural audio generative model, MDCTNet, operating in the perceptually weighted domain o...
This work presents a text-to-audio-retrieval system based on pre-trained text and spectrogram transf...
Deep learning has fueled an explosion of applications, yet training deep neural networks usually req...
Audio classification plays a crucial role in speech and sound processing tasks with a wide range of ...