Transformers have become popular in building end-to-end automatic speech recognition (ASR) systems. However, transformer ASR systems are usually trained to give output sequences in the left-to-right order, disregarding the right-to-left context. Currently, the existing transformer-based ASR systems that employ two decoders for bidirectional decoding are complex in terms of computation and optimization. The existing ASR transformer with a single decoder for bidirectional decoding requires extra methods (such as a self-mask) to resolve the problem of information leakage in the attention mechanism This paper explores different options for the development of a speech transformer that utilizes a single decoder equipped with bidirectional context...
One possible explanation for RNN language models' outsized effectiveness in voice recognition is its...
Ebru Arısoy (MEF Author)##nofulltext##Recurrent neural network language models have enjoyed great su...
International audienceWe introduce dual-decoder Transformer, a new model architecture that jointly p...
As a result of advancement in deep learning and neural network technology, end-to-end models have be...
Scene Text Recognition (STR) is the problem of recognizing the correct word or character sequence in...
In this thesis, we propose a new approach for Speech-to-Text translation, where thanks to an efficie...
Transformer models are powerful sequence-to-sequence architectures that are capable of directly mapp...
End-to-End automatic speech recognition (ASR) models aim to learn generalised representations of spe...
Transformer models are now widely used for speech processing tasks due to their powerful sequence mo...
Currently, there are mainly three Transformer encoder based streaming End to End (E2E) Automatic Spe...
Training deep neural network based Automatic Speech Recognition (ASR) models often requires thousand...
The dominant neural machine translation (NMT) models apply unified attentional encoder-decoder neura...
International audienceCombining outputs of speech recognizers is a known way of increasing speech re...
In transfer learning, two major activities, i.e., pretraining and fine-tuning, are carried out to pe...
International audienceThe combination of Automatic Speech Recognition (ASR) systems generally relies...
One possible explanation for RNN language models' outsized effectiveness in voice recognition is its...
Ebru Arısoy (MEF Author)##nofulltext##Recurrent neural network language models have enjoyed great su...
International audienceWe introduce dual-decoder Transformer, a new model architecture that jointly p...
As a result of advancement in deep learning and neural network technology, end-to-end models have be...
Scene Text Recognition (STR) is the problem of recognizing the correct word or character sequence in...
In this thesis, we propose a new approach for Speech-to-Text translation, where thanks to an efficie...
Transformer models are powerful sequence-to-sequence architectures that are capable of directly mapp...
End-to-End automatic speech recognition (ASR) models aim to learn generalised representations of spe...
Transformer models are now widely used for speech processing tasks due to their powerful sequence mo...
Currently, there are mainly three Transformer encoder based streaming End to End (E2E) Automatic Spe...
Training deep neural network based Automatic Speech Recognition (ASR) models often requires thousand...
The dominant neural machine translation (NMT) models apply unified attentional encoder-decoder neura...
International audienceCombining outputs of speech recognizers is a known way of increasing speech re...
In transfer learning, two major activities, i.e., pretraining and fine-tuning, are carried out to pe...
International audienceThe combination of Automatic Speech Recognition (ASR) systems generally relies...
One possible explanation for RNN language models' outsized effectiveness in voice recognition is its...
Ebru Arısoy (MEF Author)##nofulltext##Recurrent neural network language models have enjoyed great su...
International audienceWe introduce dual-decoder Transformer, a new model architecture that jointly p...