Automatic speaker recognition algorithms typically use pre-defined filterbanks, such as Mel-Frequency and Gammatone filterbanks, for characterizing speech audio. However, it has been observed that the features extracted using these filterbanks are not resilient to diverse audio degradations. In this work, we propose a deep learning-based technique to deduce the filterbank design from vast amounts of speech audio. The purpose of such a filterbank is to extract features robust to non-ideal audio conditions, such as degraded, short duration, and multi-lingual speech. To this effect, a 1D convolutional neural network is designed to learn a time-domain filterbank called DeepVOX directly from raw speech audio. Secondly, an adaptive triplet mining...
Learning representation from audio data has shown advantages over the handcrafted features such as m...
The automatic recognition of sound events by computers is an important aspect of emerging applicatio...
This paper proposes a Deep Learning (DL) based Wiener filter estimator for speech enhancement in the...
Deep learning techniques such as deep feedforward neural networks and deep convolutional neural netw...
International audienceModern automatic speaker verification relies largely on deep neural networks (...
In real world environments, the speech signals received by our ears are usually a combination of dif...
International audienceMost of the speech processing applications use triangular filters spaced in me...
Recent progress in deep learning has revolutionized speech recognition research, with Deep Neural Ne...
Deepfakes, algorithms that use Machine Learning (ML) to generate fake yet realistic content, represe...
While deep neural networks are now used in almost every component of a speech recognition system, fr...
Speech emotion recognition is a challenging task in speech processing field. For this reason, featur...
Approximately 1.2% of the world's population has impaired voice production. As a result, automatic d...
Various ambient noises always corrupt the audio obtained in real-world environments, which partially...
Deep learning models have improved cutting-edge technologies in many research areas, but their black...
Speech enhancement is the task that aims to improve the quality and the intelligibility of a speech ...
Learning representation from audio data has shown advantages over the handcrafted features such as m...
The automatic recognition of sound events by computers is an important aspect of emerging applicatio...
This paper proposes a Deep Learning (DL) based Wiener filter estimator for speech enhancement in the...
Deep learning techniques such as deep feedforward neural networks and deep convolutional neural netw...
International audienceModern automatic speaker verification relies largely on deep neural networks (...
In real world environments, the speech signals received by our ears are usually a combination of dif...
International audienceMost of the speech processing applications use triangular filters spaced in me...
Recent progress in deep learning has revolutionized speech recognition research, with Deep Neural Ne...
Deepfakes, algorithms that use Machine Learning (ML) to generate fake yet realistic content, represe...
While deep neural networks are now used in almost every component of a speech recognition system, fr...
Speech emotion recognition is a challenging task in speech processing field. For this reason, featur...
Approximately 1.2% of the world's population has impaired voice production. As a result, automatic d...
Various ambient noises always corrupt the audio obtained in real-world environments, which partially...
Deep learning models have improved cutting-edge technologies in many research areas, but their black...
Speech enhancement is the task that aims to improve the quality and the intelligibility of a speech ...
Learning representation from audio data has shown advantages over the handcrafted features such as m...
The automatic recognition of sound events by computers is an important aspect of emerging applicatio...
This paper proposes a Deep Learning (DL) based Wiener filter estimator for speech enhancement in the...