Word recognition in a model of visually grounded speech: An analysis using techniques inspired by human speech processing research

Scholten, J.S.M. (author)

Publication date

July 2020

Abstract

A Visually Grounded Speech model is a neural model which is trained to embed image caption pairs closely together in a common embedding space. As a result, such a model can retrieve semantically related images given a speech caption and vice versa. The purpose of this research is to investigate whether and how a Visually Grounded Speech model can recognise individual words. Literature on Word Recognition in hu- mans, Automatic Speech Recognition and Visually Grounded Speech models was evaluated. Techniques used to analyse human speech processing, such as gating and priming, were taken as inspiration for the design of the experiments used in this thesis. Multiple aspects of words recognition were investigated through three experiments. First...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Word recognition in a model of visually grounded speech: An analysis using techniques inspired by human speech processing research

Abstract

Extracted data

Word recognition in a model of visually grounded speech: An analysis using techniques inspired by human speech processing research

Abstract

Extracted data

Related items

Related items