This paper proposes weight regularization for a faster neural vocoder. Pruning time-consuming DNN modules is a promising way to realize a real-time vocoder on a CPU (e.g. WaveRNN, LPCNet). Regularization that encourages sparsity is also effective in avoiding the quality degradation created by pruning. However, the orders of weight matrices must be contiguous in SIMD size for fast vocoding. To ensure this order, we propose explicit SIMD size aware regularization. Our proposed method reshapes a weight matrix into a tensor so that the weights are aligned by group size in advance, and then computes the group Lasso-like regularization loss. Experiments on 70% sparse subband WaveRNN show that pruning in conventional Lasso and column-wise group La...
Neural models are known to be over-parameterized, and recent work has shown that sparse text-to-spee...
Neural networks are more expressive when they have multiple layers. In turn, conventional training m...
As one of the most popular sequence-to-sequence modeling approaches for speech recognition, the RNN-...
There is growing interest in unifying the streaming and full-context automatic speech recognition (A...
In this paper, we address the challenging task of simultaneously optimizing (i) the weights of a neu...
Sparsity has become one of the promising methods to compress and accelerate Deep Neural Networks (DN...
In this paper, we address the challenging task of simultaneously optimizing (i) the weights of a neu...
We present a neural vocoder designed with low-powered Alternative and Augmentative Communication dev...
Long Short-Term Memory (LSTM) recurrent networks are frequently used for tasks involving time-sequen...
DNN-based speaker verification (SV) models demonstrate significant performance at relatively high co...
Deep neural network (DNN) based speech enhancement models have attracted extensive attention due to ...
GAN vocoders are currently one of the state-of-the-art methods for building high-quality neural wave...
The modern paradigm in speech processing has demonstrated the importance of scale and compute for en...
The development of neural vocoders (NVs) has resulted in the high-quality and fast generation of wav...
From wearables to powerful smart devices, modern automatic speech recognition (ASR) models run on a ...
Neural models are known to be over-parameterized, and recent work has shown that sparse text-to-spee...
Neural networks are more expressive when they have multiple layers. In turn, conventional training m...
As one of the most popular sequence-to-sequence modeling approaches for speech recognition, the RNN-...
There is growing interest in unifying the streaming and full-context automatic speech recognition (A...
In this paper, we address the challenging task of simultaneously optimizing (i) the weights of a neu...
Sparsity has become one of the promising methods to compress and accelerate Deep Neural Networks (DN...
In this paper, we address the challenging task of simultaneously optimizing (i) the weights of a neu...
We present a neural vocoder designed with low-powered Alternative and Augmentative Communication dev...
Long Short-Term Memory (LSTM) recurrent networks are frequently used for tasks involving time-sequen...
DNN-based speaker verification (SV) models demonstrate significant performance at relatively high co...
Deep neural network (DNN) based speech enhancement models have attracted extensive attention due to ...
GAN vocoders are currently one of the state-of-the-art methods for building high-quality neural wave...
The modern paradigm in speech processing has demonstrated the importance of scale and compute for en...
The development of neural vocoders (NVs) has resulted in the high-quality and fast generation of wav...
From wearables to powerful smart devices, modern automatic speech recognition (ASR) models run on a ...
Neural models are known to be over-parameterized, and recent work has shown that sparse text-to-spee...
Neural networks are more expressive when they have multiple layers. In turn, conventional training m...
As one of the most popular sequence-to-sequence modeling approaches for speech recognition, the RNN-...