International audienceThe recently proposed Conformer architecture has shown state-of-the-art performances in Automatic Speech Recognition by combining convolution with attention to model both local and global dependencies. In this paper, we study how to reduce the Conformer architecture complexity with a limited computing budget, leading to a more efficient architecture design that we call Efficient Conformer. We introduce progressive downsampling to the Conformer encoder and propose a novel attention mechanism named grouped attention, allowing us to reduce attention complexity from $O(n^{2}d)$ to $O(n^{2}d / g)$ for sequence length $n$, hidden dimension $d$ and group size parameter $g$. We also experiment the use of strided multi-head sel...
Automatic speech recognition research focuses on training and evaluating on static datasets. Yet, as...
As a result of advancement in deep learning and neural network technology, end-to-end models have be...
Research in Automatic Speech Recognition (ASR) has been very intense in recent years with focus give...
This study addresses robust automatic speech recognition (ASR) by introducing a Conformer-based acou...
Optimization of modern ASR architectures is among the highest priority tasks since it saves many com...
Conformer has proven to be effective in many speech processing tasks. It combines the benefits of ex...
Training deep neural network based Automatic Speech Recognition (ASR) models often requires thousand...
The Transformer architecture model, based on self-attention and multi-head attention, has achieved r...
Reducing the latency and model size has always been a significant research problem for live Automati...
While transformers and their variant conformers show promising performance in speech recognition, th...
Transformer-based models excel in speech recognition. Existing efforts to optimize Transformer infer...
Most state-of-the-art Deep Learning (DL) approaches forspeaker recognition work on a short utterance...
Convolutional neural networks (CNN) and Transformer have wildly succeeded in multimedia applications...
Recently, it has been argued that encoder-decoder models can be made more interpretable by replacing...
Conformers have recently been proposed as a promising modelling approach for automatic speech recogn...
Automatic speech recognition research focuses on training and evaluating on static datasets. Yet, as...
As a result of advancement in deep learning and neural network technology, end-to-end models have be...
Research in Automatic Speech Recognition (ASR) has been very intense in recent years with focus give...
This study addresses robust automatic speech recognition (ASR) by introducing a Conformer-based acou...
Optimization of modern ASR architectures is among the highest priority tasks since it saves many com...
Conformer has proven to be effective in many speech processing tasks. It combines the benefits of ex...
Training deep neural network based Automatic Speech Recognition (ASR) models often requires thousand...
The Transformer architecture model, based on self-attention and multi-head attention, has achieved r...
Reducing the latency and model size has always been a significant research problem for live Automati...
While transformers and their variant conformers show promising performance in speech recognition, th...
Transformer-based models excel in speech recognition. Existing efforts to optimize Transformer infer...
Most state-of-the-art Deep Learning (DL) approaches forspeaker recognition work on a short utterance...
Convolutional neural networks (CNN) and Transformer have wildly succeeded in multimedia applications...
Recently, it has been argued that encoder-decoder models can be made more interpretable by replacing...
Conformers have recently been proposed as a promising modelling approach for automatic speech recogn...
Automatic speech recognition research focuses on training and evaluating on static datasets. Yet, as...
As a result of advancement in deep learning and neural network technology, end-to-end models have be...
Research in Automatic Speech Recognition (ASR) has been very intense in recent years with focus give...