We propose to use a perceptually-oriented domain to improve the quality of text-to-speech generated by deep neural networks (DNNs). We train a DNN that predicts the parameters required for speech reconstruction but whose cost function is calculated in another domain. In this paper, to represent this perceptual domain we extract an approximated version of the Spectro-Temporal Excitation Pattern that was originally proposed as part of a model of hearing speech in noise. We train DNNs that pre-dict band aperiodicity, fundamental frequency and Mel cepstral coefficients and compare generated speech when the spectral cost function is defined in the Mel cepstral, warped log spec-trum or perceptual domains. Objective results indicate that the perce...
Speech is a natural way of communicating that does not require us to develop any new skills in order...
Vocal tract length normalisation (VTLN) is well established as a speaker adaptation technique that c...
In this work, we investigate the problem of speaker independent acoustic-to-articulatory inversion (...
In this work, we implement a deep neural network for the text-to-speech system. We have tried differ...
Majority of speech processing algorithms operate only with the spectral magnitude, leaving spectral ...
Deep neural networks (DNN) have recently been shown to give state-of-the-art performance in monaural...
Deep neural networks (DNN) have recently been shown to give state-of-the-art performance in monaura...
This paper investigates deep neural networks (DNN) based on nonlinear feature mapping and statistica...
Abstract—This letter presents a regression-based speech en-hancement framework using deep neural net...
This master thesis describes the implementation and evaluation of a promising approach to speech enh...
In contrast to classical noise reduction methods introduced over the past decades, this work focuses...
Neural network-based models that generate glottal excitation waveforms from acoustic features have b...
Speech enhancement directly using deep neural network (DNN) is of major interest due to the capabili...
Advancements in machine learning techniques have promoted the use of deep neural networks (DNNs) for...
Most deep noise suppression (DNS) models are trained with reference-based losses requiring access to...
Speech is a natural way of communicating that does not require us to develop any new skills in order...
Vocal tract length normalisation (VTLN) is well established as a speaker adaptation technique that c...
In this work, we investigate the problem of speaker independent acoustic-to-articulatory inversion (...
In this work, we implement a deep neural network for the text-to-speech system. We have tried differ...
Majority of speech processing algorithms operate only with the spectral magnitude, leaving spectral ...
Deep neural networks (DNN) have recently been shown to give state-of-the-art performance in monaural...
Deep neural networks (DNN) have recently been shown to give state-of-the-art performance in monaura...
This paper investigates deep neural networks (DNN) based on nonlinear feature mapping and statistica...
Abstract—This letter presents a regression-based speech en-hancement framework using deep neural net...
This master thesis describes the implementation and evaluation of a promising approach to speech enh...
In contrast to classical noise reduction methods introduced over the past decades, this work focuses...
Neural network-based models that generate glottal excitation waveforms from acoustic features have b...
Speech enhancement directly using deep neural network (DNN) is of major interest due to the capabili...
Advancements in machine learning techniques have promoted the use of deep neural networks (DNNs) for...
Most deep noise suppression (DNS) models are trained with reference-based losses requiring access to...
Speech is a natural way of communicating that does not require us to develop any new skills in order...
Vocal tract length normalisation (VTLN) is well established as a speaker adaptation technique that c...
In this work, we investigate the problem of speaker independent acoustic-to-articulatory inversion (...