Vocoders are models capable of transforming a low-dimensional spectral representation of an audio signal, typically the mel spectrogram, to a waveform. Modern speech generation pipelines use a vocoder as their final component. Recent vocoder models developed for speech achieve high degree of realism, such that it is natural to wonder how they would perform on music signals. Compared to speech, the heterogeneity and structure of the musical sound texture offers new challenges. In this work we focus on one specific artifact that some vocoder models designed for speech tend to exhibit when applied to music: the perceived instability of pitch when synthesizing sustained notes. We argue that the characteristic sound of this artifact is due to th...
animal vocalizations, exhibit two key properties setting them apart from artificial pitch stimuli co...
This paper considers the problem of obtaining an accurate spectral representation of speech formant ...
A model of pitch perception, called the Spatial Pitch Network or SPINET model, is developed and anal...
This paper presents a convolutional neural network (CNN) that uses input from a polyphonic pitch est...
Additive analysis-synthesis using the phase vocoder is a powerful tool for the exploration of musica...
This dissertation presents two extensions to the phase vocoder method of sound analysis and synthesi...
For a broad range of sound transformations, quality is measured according to the common expectation ...
A vocoder is a conditional audio generation model that converts acoustic features such as mel-spectr...
Playing a recorded audio signal at faster rate than its original rate results not only in shortening...
This work explores the possibility of modeling tonal stability from music signals, in an attempt to ...
The use of the mel spectrogram as a signal parameterization for voice generation is quite recent and...
Neural coding of the pitch of complex sounds is vital for animals' ability to communicate and to per...
The paper examines the usage of Convolutional Bidirectional Recurrent Neural Network (CBRNN) for a p...
Transformation of polyphonic audio in terms of pitch shifting and time stretching is often desirable...
Generative Adversarial Networks (GANs) currently achieve the state-of-the-art sound synthesis qualit...
animal vocalizations, exhibit two key properties setting them apart from artificial pitch stimuli co...
This paper considers the problem of obtaining an accurate spectral representation of speech formant ...
A model of pitch perception, called the Spatial Pitch Network or SPINET model, is developed and anal...
This paper presents a convolutional neural network (CNN) that uses input from a polyphonic pitch est...
Additive analysis-synthesis using the phase vocoder is a powerful tool for the exploration of musica...
This dissertation presents two extensions to the phase vocoder method of sound analysis and synthesi...
For a broad range of sound transformations, quality is measured according to the common expectation ...
A vocoder is a conditional audio generation model that converts acoustic features such as mel-spectr...
Playing a recorded audio signal at faster rate than its original rate results not only in shortening...
This work explores the possibility of modeling tonal stability from music signals, in an attempt to ...
The use of the mel spectrogram as a signal parameterization for voice generation is quite recent and...
Neural coding of the pitch of complex sounds is vital for animals' ability to communicate and to per...
The paper examines the usage of Convolutional Bidirectional Recurrent Neural Network (CBRNN) for a p...
Transformation of polyphonic audio in terms of pitch shifting and time stretching is often desirable...
Generative Adversarial Networks (GANs) currently achieve the state-of-the-art sound synthesis qualit...
animal vocalizations, exhibit two key properties setting them apart from artificial pitch stimuli co...
This paper considers the problem of obtaining an accurate spectral representation of speech formant ...
A model of pitch perception, called the Spatial Pitch Network or SPINET model, is developed and anal...