Phonetic information is one of the most essential components of a speech signal, playing an important role for many speech processing tasks. However, it is difficult to integrate phonetic information into speaker verification systems since it occurs primarily at the frame level while speaker characteristics typically reside at the segment level. In deep neural network-based speaker verification, existing methods only apply phonetic information to the frame-wise trained speaker embeddings. To improve this weakness, this paper proposes phonetic adaptation and hybrid multi-task learning and further combines these into c-vector and simplified c-vector architectures. Experiments on National Institute of Standards and Technology (NIST) speaker re...
Speaker recognition is one of the field topics widely used in the field of speech technology, many r...
The goal of this thesis is to improve current state-of-the-art techniques in speaker verification (S...
In this paper, a hierarchical attention network is proposed to generate utterance-level embeddings (...
In the recent past, Deep neural networks became the most successful approach to extract the speaker ...
Advancements in automatic speaker verification (ASV) can be considered to be primarily limited to im...
This paper presents an improved deep embedding learning method based on convolutional neural network...
While the use of deep neural networks has significantly boosted speaker recognition performance, it ...
This paper explores three novel approaches to improve the performance of speaker verification (SV) s...
In this paper we investigate the use of deep neural networks (DNNs) for a small footprint text-depen...
The objective of this work is to study state-of-the-art deep neural networks based speaker verificat...
Effective speaker identification is essential for achieving robust speaker recognition in real-world...
This paper presents the SJTU system for both text-dependent and text-independent tasks in short-dura...
Speaker verification (SV) is a task to verify a claimed identity from the voice signal. A well-perfo...
In recent years, deep neural network models gained popularity as a modeling approach for many speech...
Model-based approaches to Speaker Verification (SV), such as Joint Factor Analysis (JFA), i-vector a...
Speaker recognition is one of the field topics widely used in the field of speech technology, many r...
The goal of this thesis is to improve current state-of-the-art techniques in speaker verification (S...
In this paper, a hierarchical attention network is proposed to generate utterance-level embeddings (...
In the recent past, Deep neural networks became the most successful approach to extract the speaker ...
Advancements in automatic speaker verification (ASV) can be considered to be primarily limited to im...
This paper presents an improved deep embedding learning method based on convolutional neural network...
While the use of deep neural networks has significantly boosted speaker recognition performance, it ...
This paper explores three novel approaches to improve the performance of speaker verification (SV) s...
In this paper we investigate the use of deep neural networks (DNNs) for a small footprint text-depen...
The objective of this work is to study state-of-the-art deep neural networks based speaker verificat...
Effective speaker identification is essential for achieving robust speaker recognition in real-world...
This paper presents the SJTU system for both text-dependent and text-independent tasks in short-dura...
Speaker verification (SV) is a task to verify a claimed identity from the voice signal. A well-perfo...
In recent years, deep neural network models gained popularity as a modeling approach for many speech...
Model-based approaches to Speaker Verification (SV), such as Joint Factor Analysis (JFA), i-vector a...
Speaker recognition is one of the field topics widely used in the field of speech technology, many r...
The goal of this thesis is to improve current state-of-the-art techniques in speaker verification (S...
In this paper, a hierarchical attention network is proposed to generate utterance-level embeddings (...