The thesis is a replication of the work by Takaaki Hori and his colleagues (2019), which introduces a new method to train end-to-end automatic speech recognition (ASR) models using unpaired speech. In general, large amounts of paired data (speech and text) are needed to train an end-to-end automatic speech recognition system. To alleviate the problem of limited paired data, the idea of cycle-consistency losses has been proposed recently in areas such as machine translation and computer vision. In ASR, cycle-consistency training is achieved by building a reverse system, e.g., a text-to-speech system, and designing a loss based on the reconstructed signal and the original one. However, it is not straightforward to apply cycle-consistency in A...
The computational resources were provided by Aalto ScienceIT. We are grateful for the Academy of Fin...
Unsupervised acoustic modeling can offer a cost and time effective way of creating a solid acoustic ...
Automatic speech recognition (ASR) decodes speech signals into text. While ASR can produce accurate ...
Consistency regularization has recently been applied to semi-supervised sequence-to-sequence (S2S) a...
As a result of advancement in deep learning and neural network technology, end-to-end models have be...
Adapting Automatic Speech Recognition (ASR) models to new domains results in a deterioration of perf...
This paper studies a novel pre-training technique with unpaired speech data, Speech2C, for encoder-d...
Training an automatic speech recognition (ASR) post-processor based on sequence-to-sequence (S2S) re...
ASR error correction continues to serve as an important part of post-processing for speech recogniti...
Training deep neural network based Automatic Speech Recognition (ASR) models often requires thousand...
Adapting a trained Automatic Speech Recognition (ASR) model to new tasks results in catastrophic for...
International audienceSelf-supervised learning from raw speech has been proven beneficial to improve...
Training domain-specific automatic speech recognition (ASR) systems requires a suitable amount of da...
The performance of the speech recognition systems to translate voice to text is still an issue in la...
Over the past decades, the dominant approach towards building automatic speech recognition (ASR) sys...
The computational resources were provided by Aalto ScienceIT. We are grateful for the Academy of Fin...
Unsupervised acoustic modeling can offer a cost and time effective way of creating a solid acoustic ...
Automatic speech recognition (ASR) decodes speech signals into text. While ASR can produce accurate ...
Consistency regularization has recently been applied to semi-supervised sequence-to-sequence (S2S) a...
As a result of advancement in deep learning and neural network technology, end-to-end models have be...
Adapting Automatic Speech Recognition (ASR) models to new domains results in a deterioration of perf...
This paper studies a novel pre-training technique with unpaired speech data, Speech2C, for encoder-d...
Training an automatic speech recognition (ASR) post-processor based on sequence-to-sequence (S2S) re...
ASR error correction continues to serve as an important part of post-processing for speech recogniti...
Training deep neural network based Automatic Speech Recognition (ASR) models often requires thousand...
Adapting a trained Automatic Speech Recognition (ASR) model to new tasks results in catastrophic for...
International audienceSelf-supervised learning from raw speech has been proven beneficial to improve...
Training domain-specific automatic speech recognition (ASR) systems requires a suitable amount of da...
The performance of the speech recognition systems to translate voice to text is still an issue in la...
Over the past decades, the dominant approach towards building automatic speech recognition (ASR) sys...
The computational resources were provided by Aalto ScienceIT. We are grateful for the Academy of Fin...
Unsupervised acoustic modeling can offer a cost and time effective way of creating a solid acoustic ...
Automatic speech recognition (ASR) decodes speech signals into text. While ASR can produce accurate ...