A Visually Grounded Speech model is a neural model which is trained to embed image caption pairs closely together in a common embedding space. As a result, such a model can retrieve semantically related images given a speech caption and vice versa. The purpose of this research is to investigate whether and how a Visually Grounded Speech model can recognise individual words. Literature on Word Recognition in hu- mans, Automatic Speech Recognition and Visually Grounded Speech models was evaluated. Techniques used to analyse human speech processing, such as gating and priming, were taken as inspiration for the design of the experiments used in this thesis. Multiple aspects of words recognition were investigated through three experiments. First...
Speech is at the core of human communication. Speaking and listing comes so natural to us that we do...
Visually grounded speech representation learning has shown to be useful in the field of speech repre...
There is now considerable evidence that fine-grained acoustic-phonetic detail in the speech signal h...
We investigated word recognition in a Visually Grounded Speech model. The model has been trained on ...
Many computational models of speech recognition assume that the set of target words is already given...
International audienceIn this paper, we study how word-like units are represented and activated in a...
International audienceIn this paper, we study how word-like units are represented and activated in a...
A set of recorded isolated nouns, verbs and image annotations used for testing the word recognition...
Humans learn language by interaction with their environment and listening to other humans. It should...
In everyday life, speech is all around us, on the radio, television, and in human-human interaction....
Computational models can reflect the complexity of human behaviour by implementing multiple constrai...
Item does not contain fulltextIn everyday life, speech is all around us, on the radio, television, a...
All words of the languages we know are stored in the mental lexicon. Psycholinguistic models describ...
This paper describes a study in which we compare human and automatic recognition of words in fluent ...
Contains fulltext : 56234.pdf (publisher's version ) (Open Access)In everyday life...
Speech is at the core of human communication. Speaking and listing comes so natural to us that we do...
Visually grounded speech representation learning has shown to be useful in the field of speech repre...
There is now considerable evidence that fine-grained acoustic-phonetic detail in the speech signal h...
We investigated word recognition in a Visually Grounded Speech model. The model has been trained on ...
Many computational models of speech recognition assume that the set of target words is already given...
International audienceIn this paper, we study how word-like units are represented and activated in a...
International audienceIn this paper, we study how word-like units are represented and activated in a...
A set of recorded isolated nouns, verbs and image annotations used for testing the word recognition...
Humans learn language by interaction with their environment and listening to other humans. It should...
In everyday life, speech is all around us, on the radio, television, and in human-human interaction....
Computational models can reflect the complexity of human behaviour by implementing multiple constrai...
Item does not contain fulltextIn everyday life, speech is all around us, on the radio, television, a...
All words of the languages we know are stored in the mental lexicon. Psycholinguistic models describ...
This paper describes a study in which we compare human and automatic recognition of words in fluent ...
Contains fulltext : 56234.pdf (publisher's version ) (Open Access)In everyday life...
Speech is at the core of human communication. Speaking and listing comes so natural to us that we do...
Visually grounded speech representation learning has shown to be useful in the field of speech repre...
There is now considerable evidence that fine-grained acoustic-phonetic detail in the speech signal h...