Target-Speaker Voice Activity Detection (TS-VAD) utilizes a set of speaker profiles alongside an input audio signal to perform speaker diarization. While its superiority over conventional methods has been demonstrated, the method can suffer from errors in speaker profiles, as those profiles are typically obtained by running a traditional clustering-based diarization method over the input signal. This paper proposes an extension to TS-VAD, called Profile-Error-Tolerant TS-VAD (PET-TSVAD), which is robust to such speaker profile errors. This is achieved by employing transformer-based TS-VAD that can handle a variable number of speakers and further introducing a set of additional pseudo-speaker profiles to handle speakers undetected during the...
Voice activity detection (VAD) algorithms are essential for many speech processing applications, suc...
Over the last few years, deep learning has grown in popularity for speaker verification, identificat...
Voice activity detection (VAD) is a fundamental task in various speech-related applications, such as...
This paper discribes the DKU-DukeECE submission to the 4th track of the VoxCeleb Speaker Recognition...
This paper proposes an online target speaker voice activity detection system for speaker diarization...
This paper details our speaker diarization system designed for multi-domain, multi-microphone casual...
Voice trigger detection is an important task, which enables activating a voice assistant when a targ...
Speaker diarization algorithms address the "who spoke when" problem in audio recordings. Algorithms ...
This paper describes the BUCEA speaker diarization system for the 2022 VoxCeleb Speaker Recognition ...
Speaker verification (SV) provides billions of voice-enabled devices with access control, and ensure...
In recent years, self-supervised learning paradigm has received extensive attention due to its great...
International audienceThis paper presents the problems and solutions addressed at the JSALT workshop...
Speaker recognition (SR) under mismatched conditions is a challenging task. Speech signal is nonline...
Our focus lies in developing an online speaker diarisation framework which demonstrates robust perfo...
This technical report describes our system for track 1, 2 and 4 of the VoxCeleb Speaker Recognition ...
Voice activity detection (VAD) algorithms are essential for many speech processing applications, suc...
Over the last few years, deep learning has grown in popularity for speaker verification, identificat...
Voice activity detection (VAD) is a fundamental task in various speech-related applications, such as...
This paper discribes the DKU-DukeECE submission to the 4th track of the VoxCeleb Speaker Recognition...
This paper proposes an online target speaker voice activity detection system for speaker diarization...
This paper details our speaker diarization system designed for multi-domain, multi-microphone casual...
Voice trigger detection is an important task, which enables activating a voice assistant when a targ...
Speaker diarization algorithms address the "who spoke when" problem in audio recordings. Algorithms ...
This paper describes the BUCEA speaker diarization system for the 2022 VoxCeleb Speaker Recognition ...
Speaker verification (SV) provides billions of voice-enabled devices with access control, and ensure...
In recent years, self-supervised learning paradigm has received extensive attention due to its great...
International audienceThis paper presents the problems and solutions addressed at the JSALT workshop...
Speaker recognition (SR) under mismatched conditions is a challenging task. Speech signal is nonline...
Our focus lies in developing an online speaker diarisation framework which demonstrates robust perfo...
This technical report describes our system for track 1, 2 and 4 of the VoxCeleb Speaker Recognition ...
Voice activity detection (VAD) algorithms are essential for many speech processing applications, suc...
Over the last few years, deep learning has grown in popularity for speaker verification, identificat...
Voice activity detection (VAD) is a fundamental task in various speech-related applications, such as...