Self-attention-based networks have obtained impressive performance in parallel training and global context modeling. However, it is weak in local dependency capturing, especially for data with strong local correlations such as utterances. Therefore, we will mine linguistic information of the original text based on a semantic dependency and the semantic relationship between nodes is regarded as prior knowledge to revise the distribution of self-attention. On the other hand, given the strong correlation between input characters, we introduce a one-dimensional (1-D) convolution neural network (CNN) producing query(Q) and value(V) in the self-attention mechanism for a better fusion of local contextual information. Then, we migrate this variant ...
This paper introduces an improved duration informed attention neural network (DurIAN-E) for expressi...
Acquiring speech signal in real-world environment is always accompanied by various ambient noises, w...
Conventional statistical parametric speech synthesis relies on decision trees to cluster together si...
Generative adversarial networks (GANs) have shown their superiority for speech enhancement. Neverthe...
Although end-to-end neural text-to-speech (TTS) methods (such as Tacotron2) are proposed and achieve...
Self-attention model has shown its flexibility in parallel computation and the effectiveness on mode...
University of Technology Sydney. Faculty of Engineering and Information Technology.This research stu...
Existing generative adversarial networks (GANs) for speech enhancement solely rely on the convolutio...
Speech Emotion Recognition (SER) has been shown to benefit from many of the recent advances in deep ...
Transformers have achieved state-of-the-art results across multiple NLP tasks. However, the self-att...
Large pretrained language models using the transformer neural network architecture are becoming a do...
Many natural language processing tasks solely rely on sparse dependencies between a few tokens in a ...
Sequence-to-sequence neural networks with attention mechanisms have recently been widely adopted for...
Deep neural networks trained with supervised learning algorithms on large amounts of labeled speech ...
Joint sound event localization and detection (SELD) is an emerging audio signal processing task addi...
This paper introduces an improved duration informed attention neural network (DurIAN-E) for expressi...
Acquiring speech signal in real-world environment is always accompanied by various ambient noises, w...
Conventional statistical parametric speech synthesis relies on decision trees to cluster together si...
Generative adversarial networks (GANs) have shown their superiority for speech enhancement. Neverthe...
Although end-to-end neural text-to-speech (TTS) methods (such as Tacotron2) are proposed and achieve...
Self-attention model has shown its flexibility in parallel computation and the effectiveness on mode...
University of Technology Sydney. Faculty of Engineering and Information Technology.This research stu...
Existing generative adversarial networks (GANs) for speech enhancement solely rely on the convolutio...
Speech Emotion Recognition (SER) has been shown to benefit from many of the recent advances in deep ...
Transformers have achieved state-of-the-art results across multiple NLP tasks. However, the self-att...
Large pretrained language models using the transformer neural network architecture are becoming a do...
Many natural language processing tasks solely rely on sparse dependencies between a few tokens in a ...
Sequence-to-sequence neural networks with attention mechanisms have recently been widely adopted for...
Deep neural networks trained with supervised learning algorithms on large amounts of labeled speech ...
Joint sound event localization and detection (SELD) is an emerging audio signal processing task addi...
This paper introduces an improved duration informed attention neural network (DurIAN-E) for expressi...
Acquiring speech signal in real-world environment is always accompanied by various ambient noises, w...
Conventional statistical parametric speech synthesis relies on decision trees to cluster together si...