To extract accurate speaker information for text-independent speaker verification, temporal dynamic CNNs (TDY-CNNs) adapting kernels to each time bin was proposed. However, model size of TDY-CNN is too large and the adaptive kernel's degree of freedom is limited. To address these limitations, we propose decomposed temporal dynamic CNNs (DTDY-CNNs) which forms time-adaptive kernel by combining static kernel with dynamic residual based on matrix decomposition. Proposed DTDY-ResNet-34(x0.50) using attentive statistical pooling without data augmentation shows EER of 0.96%, which is better than other state-of-the-art methods. DTDY-CNNs are successful upgrade of TDY-CNNs, reducing the model size by 64% and enhancing the performance. We showed tha...
Speaker adaptive training (SAT) is a well studied technique for Gaussian mixture acoustic models (GM...
Deep learning and neural network research has grown significantly in the fields of automatic speech ...
neme recognition which is characterized by two important properties: 1.) Using a 3 layer arrangement...
Convolutional neural networks (CNNs) have significantly promoted the development of speaker verifica...
Current speaker verification techniques rely on a neural network to extract speaker representations....
This paper presents an improved deep embedding learning method based on convolutional neural network...
This paper describes the IDLab submission for the text-independent task of the Short-duration Speake...
Time delay neural networks (TDNNs) are an effective acoustic model for large vocabulary speech recog...
Speaker adaptive training (SAT) is a well studied technique for Gaussian mixture acoustic models (GM...
While deep neural networks have shown impressive results in automatic speaker recognition and relate...
• Implement a high-accuracy text-dependent/short-duration speaker id system • Exploit Deep Neural Ne...
Under the short utterance environment, the total variability space underestimates the distribution o...
Learning an effective speaker representation is crucial for achieving reliable performance in speake...
Despite achieving satisfactory performance in speaker verification using deep neural networks, varia...
In speaker recognition tasks, convolutional neural network (CNN)-based approaches have shown signifi...
Speaker adaptive training (SAT) is a well studied technique for Gaussian mixture acoustic models (GM...
Deep learning and neural network research has grown significantly in the fields of automatic speech ...
neme recognition which is characterized by two important properties: 1.) Using a 3 layer arrangement...
Convolutional neural networks (CNNs) have significantly promoted the development of speaker verifica...
Current speaker verification techniques rely on a neural network to extract speaker representations....
This paper presents an improved deep embedding learning method based on convolutional neural network...
This paper describes the IDLab submission for the text-independent task of the Short-duration Speake...
Time delay neural networks (TDNNs) are an effective acoustic model for large vocabulary speech recog...
Speaker adaptive training (SAT) is a well studied technique for Gaussian mixture acoustic models (GM...
While deep neural networks have shown impressive results in automatic speaker recognition and relate...
• Implement a high-accuracy text-dependent/short-duration speaker id system • Exploit Deep Neural Ne...
Under the short utterance environment, the total variability space underestimates the distribution o...
Learning an effective speaker representation is crucial for achieving reliable performance in speake...
Despite achieving satisfactory performance in speaker verification using deep neural networks, varia...
In speaker recognition tasks, convolutional neural network (CNN)-based approaches have shown signifi...
Speaker adaptive training (SAT) is a well studied technique for Gaussian mixture acoustic models (GM...
Deep learning and neural network research has grown significantly in the fields of automatic speech ...
neme recognition which is characterized by two important properties: 1.) Using a 3 layer arrangement...