Current state-of-the-art translation systems for speech-to-speech rely heavily on a text representation for the translation. By transcoding speech to text we lose important information about the characteristics of the voice such as the emotion, pitch and accent. This thesis examine the possibility of using an LSTM neural network model to translate speech-to-speech without the need of a text representation. That is by translating using the raw audio data directly in order to persevere the characteristics of the voice that otherwise get lost in the text transcoding part of the translation process. As part of this research we create a data set of phrases suitable for speech-to-speech translation tasks. The thesis result in a proof of concept s...
Text-to-phoneme (TTP) mapping, also called grapheme-to-phoneme (GTP) conversion, defines the process...
This paper presents a speech recognition sys-tem that directly transcribes audio data with text, wit...
Machine Translation is the translation of text or speech by a computer with no human involvement. It...
Speech translation is the translation of speech in one language typically to text in another, tradit...
Speech Recognition and Text-to-Text Translation systems have been improving significantly in recent ...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
End-to-end (E2E) speech-to-text translation (ST) often depends on pretraining its encoder and/or dec...
Speech translation is the translation of speech in one language typically to text in another, tradit...
Speech translation is the translation of speech in one language typically to text in another, tradit...
Speech translation is the translation of speech in one language typically to text in another, tradit...
Direct speech-to-text translation (ST) is an emerging approach that consists in performing the ST ta...
Abstract — Speech Recognition is the translation of spoken words into text. Speech recognition invol...
With the rapid development of big data and deep learning, breakthroughs have been made in phonetic a...
Neural transducers have been widely used in automatic speech recognition (ASR). In this paper, we in...
Text-to-phoneme (TTP) mapping, also called grapheme-to-phoneme (GTP) conversion, defines the process...
Text-to-phoneme (TTP) mapping, also called grapheme-to-phoneme (GTP) conversion, defines the process...
This paper presents a speech recognition sys-tem that directly transcribes audio data with text, wit...
Machine Translation is the translation of text or speech by a computer with no human involvement. It...
Speech translation is the translation of speech in one language typically to text in another, tradit...
Speech Recognition and Text-to-Text Translation systems have been improving significantly in recent ...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
End-to-end (E2E) speech-to-text translation (ST) often depends on pretraining its encoder and/or dec...
Speech translation is the translation of speech in one language typically to text in another, tradit...
Speech translation is the translation of speech in one language typically to text in another, tradit...
Speech translation is the translation of speech in one language typically to text in another, tradit...
Direct speech-to-text translation (ST) is an emerging approach that consists in performing the ST ta...
Abstract — Speech Recognition is the translation of spoken words into text. Speech recognition invol...
With the rapid development of big data and deep learning, breakthroughs have been made in phonetic a...
Neural transducers have been widely used in automatic speech recognition (ASR). In this paper, we in...
Text-to-phoneme (TTP) mapping, also called grapheme-to-phoneme (GTP) conversion, defines the process...
Text-to-phoneme (TTP) mapping, also called grapheme-to-phoneme (GTP) conversion, defines the process...
This paper presents a speech recognition sys-tem that directly transcribes audio data with text, wit...
Machine Translation is the translation of text or speech by a computer with no human involvement. It...