Books are a rich source of both fine-grained information, how a character, an object or a scene looks like, as well as high-level semantics, what someone is thinking, feeling and how these states evolve through a story. This paper aims to align books to their movie releases in order to provide rich descriptive explanations for visual content that go semantically far beyond the captions available in the current datasets. To align movies and books we propose a neural sentence embedding that is trained in an unsupervised way from a large corpus of books, as well as a video-text neural embedding for computing similarities between movie clips and sentences in the book. We propose a context-aware CNN to combine information from multiple sources. ...
Recent work has shown that the integration of visual information into text-based models can substant...
Recent research in Deep Learning has sent the quality of results in multimedia tasks rocketing: than...
When dealing with movies, closing the tremendous discontinuity between low-level features and the ri...
Books are a rich source of both fine-grained information, how a character, an object or a scene look...
Humans spend a large amount of time listening, watching, and reading stories. We argue that the abil...
Film adaptations of novels often visually display in a few shots what is described in many pages of ...
Automatic text alignment is an important problem in natural language processing. It can be used to c...
Audio Description (AD) provides linguistic descriptions of movies and allows visually impaired peopl...
We present a model that generates natural language de-scriptions of images and their regions. Our ap...
Thesis (Ph. D.)--University of Rochester. Department of Computer Science, 2016.Today we encounter la...
A long standing goal of artificial intelligence is to enable machines to perceive the visual world a...
ABSTRACT: Audio description (AD) provides linguistic descriptions of movies and allows visually impa...
Movies and TV are a rich source of diverse and complex video of people, objects, actions and locales...
The art of adapting existing text for the film is a challenge for the scriptwriter. This study was ...
We present a model that generates natural language de-scriptions of images and their regions. Our ap...
Recent work has shown that the integration of visual information into text-based models can substant...
Recent research in Deep Learning has sent the quality of results in multimedia tasks rocketing: than...
When dealing with movies, closing the tremendous discontinuity between low-level features and the ri...
Books are a rich source of both fine-grained information, how a character, an object or a scene look...
Humans spend a large amount of time listening, watching, and reading stories. We argue that the abil...
Film adaptations of novels often visually display in a few shots what is described in many pages of ...
Automatic text alignment is an important problem in natural language processing. It can be used to c...
Audio Description (AD) provides linguistic descriptions of movies and allows visually impaired peopl...
We present a model that generates natural language de-scriptions of images and their regions. Our ap...
Thesis (Ph. D.)--University of Rochester. Department of Computer Science, 2016.Today we encounter la...
A long standing goal of artificial intelligence is to enable machines to perceive the visual world a...
ABSTRACT: Audio description (AD) provides linguistic descriptions of movies and allows visually impa...
Movies and TV are a rich source of diverse and complex video of people, objects, actions and locales...
The art of adapting existing text for the film is a challenge for the scriptwriter. This study was ...
We present a model that generates natural language de-scriptions of images and their regions. Our ap...
Recent work has shown that the integration of visual information into text-based models can substant...
Recent research in Deep Learning has sent the quality of results in multimedia tasks rocketing: than...
When dealing with movies, closing the tremendous discontinuity between low-level features and the ri...