Multiple F0 estimation in vocal ensembles using convolutional neural networks

Cuesta, Helena
McFee, Brian
Gómez Gutiérrez, Emilia, 1975-

Publication date

January 2020

Publisher

International Society for Music Information Retrieval (ISMIR)

Abstract

Comunicació presentada a: International Society for Music Information Retrieval Conference celebrat de l'11 al 16 d'octubre de 2020 de manera virtual.This paper addresses the extraction of multiple F0 values from polyphonic and a cappella vocal performances using convolutional neural networks (CNNs). We address the major challenges of ensemble singing, i.e., all melodic sources are vocals and singers sing in harmony. We build upon an existing architecture to produce a pitch salience function of the input signal, where the harmonic constantQ transform (HCQT) and its associated phase differentials are used as an input representation. The pitch salience function is subsequently thresholded to obtain a multiple F0 estimation output. For trainin...