International audienceContrastive Predictive Coding (CPC), based on predicting future segments of speech based on past segments is emerging as a powerful algorithm for representation learning of speech signal. However, it still under-performs other methods on unsupervised evaluation benchmarks. Here, we introduce WavAugment, a time-domain data augmentation library and find that applying augmentation in the past is generally more efficient and yields better performances than other methods. We find that a combination of pitch modification, additive noise and reverberation substantially increase the performance of CPC (relative improvement of 18-22%), beating the reference Libri-light results with 600 times less data. Using an out-of-domain da...
Data augmentation is a technique to generate new training data based on existing data. We evaluate t...
International audienceSelf-supervised learning from raw speech has been proven beneficial to improve...
Methods for extracting audio and speech features have been studied since pioneering work on spectrum...
International audienceRecent work on unsupervised contrastive learning of speech representation has ...
International audienceCross-lingual and multilingual training of Automatic Speech Recognition (ASR) ...
International audienceUnsupervised models of representations based on Contrastive Predictive Coding ...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
14 pages, including references and supplementary materialInternational audienceWe introduce a new un...
End-to-end Automatic Speech Recognition (ASR) models are usually trained to optimize the loss of the...
Accepted to 36th Conference on Neural Information Processing Systems (NeurIPS 2022)International aud...
Deep neural networks trained with supervised learning algorithms on large amounts of labeled speech ...
International audienceWe introduce a simple neural encoder architecture that can be trained using an...
Temporal regularities in speech, such as interdependencies in the timing of speech events, are thoug...
Learning time-series representations when only unlabeled data or few labeled samples are available c...
Current speech recognition systems uniformly employ short-time spectral analysis, usually over windo...
Data augmentation is a technique to generate new training data based on existing data. We evaluate t...
International audienceSelf-supervised learning from raw speech has been proven beneficial to improve...
Methods for extracting audio and speech features have been studied since pioneering work on spectrum...
International audienceRecent work on unsupervised contrastive learning of speech representation has ...
International audienceCross-lingual and multilingual training of Automatic Speech Recognition (ASR) ...
International audienceUnsupervised models of representations based on Contrastive Predictive Coding ...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
14 pages, including references and supplementary materialInternational audienceWe introduce a new un...
End-to-end Automatic Speech Recognition (ASR) models are usually trained to optimize the loss of the...
Accepted to 36th Conference on Neural Information Processing Systems (NeurIPS 2022)International aud...
Deep neural networks trained with supervised learning algorithms on large amounts of labeled speech ...
International audienceWe introduce a simple neural encoder architecture that can be trained using an...
Temporal regularities in speech, such as interdependencies in the timing of speech events, are thoug...
Learning time-series representations when only unlabeled data or few labeled samples are available c...
Current speech recognition systems uniformly employ short-time spectral analysis, usually over windo...
Data augmentation is a technique to generate new training data based on existing data. We evaluate t...
International audienceSelf-supervised learning from raw speech has been proven beneficial to improve...
Methods for extracting audio and speech features have been studied since pioneering work on spectrum...