With the similarity between music and speech synthesis from symbolic input and the rapid development of text-to-speech (TTS) techniques, it is worthwhile to explore ways to improve the MIDI-to-audio performance by borrowing from TTS techniques. In this study, we analyze the shortcomings of a TTS-based MIDI-to-audio system and improve it in terms of feature computation, model selection, and training strategy, aiming to synthesize highly natural-sounding audio. Moreover, we conducted an extensive model evaluation through listening tests, pitch measurement, and spectrogram analysis. This work demonstrates not only synthesis of highly natural music but offers a thorough analytical approach and useful outcomes for the community. Our code and pre...
We present the Neural Waveshaping Unit (NEWT): a novel, lightweight, fully causal approach to neural...
Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program i...
We explore the use of neural synthesis for acoustic guitar from string-wise MIDI input. We propose f...
This is the pretrained model for our paper submitted to ICASSP 2023: "CAN KNOWLEDGE OF END-TO-END TE...
We present a non-supervised approach to optimize and evaluate the synthesis of non-speech audio effe...
We present work in progress on TimbreCLIP, an audio-text cross modal embedding trained on single ins...
The rise of deep learning algorithms has led many researchers to withdraw from using classic signal ...
Foley sound synthesis refers to the creation of authentic, diegetic sound effects for media, such as...
International audienceRecent progress in deep learning for audio synthesis opens the way to models t...
We present a neural vocoder designed with low-powered Alternative and Augmentative Communication dev...
The mainstream neural text-to-speech(TTS) pipeline is a cascade system, including an acoustic model(...
How can we provide interfaces to synthesis algorithms that will allow us to manipulate timbre direct...
Text-to-speech synthesis (TTS) has progressed to such a stage that given a large, clean, phoneticall...
This paper integrates a classic mel-cepstral synthesis filter into a modern neural speech synthesis ...
Text-to-audio generation (TTA) produces audio from a text description, learning from pairs of audio ...
We present the Neural Waveshaping Unit (NEWT): a novel, lightweight, fully causal approach to neural...
Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program i...
We explore the use of neural synthesis for acoustic guitar from string-wise MIDI input. We propose f...
This is the pretrained model for our paper submitted to ICASSP 2023: "CAN KNOWLEDGE OF END-TO-END TE...
We present a non-supervised approach to optimize and evaluate the synthesis of non-speech audio effe...
We present work in progress on TimbreCLIP, an audio-text cross modal embedding trained on single ins...
The rise of deep learning algorithms has led many researchers to withdraw from using classic signal ...
Foley sound synthesis refers to the creation of authentic, diegetic sound effects for media, such as...
International audienceRecent progress in deep learning for audio synthesis opens the way to models t...
We present a neural vocoder designed with low-powered Alternative and Augmentative Communication dev...
The mainstream neural text-to-speech(TTS) pipeline is a cascade system, including an acoustic model(...
How can we provide interfaces to synthesis algorithms that will allow us to manipulate timbre direct...
Text-to-speech synthesis (TTS) has progressed to such a stage that given a large, clean, phoneticall...
This paper integrates a classic mel-cepstral synthesis filter into a modern neural speech synthesis ...
Text-to-audio generation (TTA) produces audio from a text description, learning from pairs of audio ...
We present the Neural Waveshaping Unit (NEWT): a novel, lightweight, fully causal approach to neural...
Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program i...
We explore the use of neural synthesis for acoustic guitar from string-wise MIDI input. We propose f...