Advanced image-based application systems such as image retrieval and visual question answering depend heavily on semantic image region annotation. However, improvements in image region annotation are limited because of our inability to understand how humans, the end users, process these images and image regions. In this work, we expand a framework for capturing image region annotations where interpreting an image is influenced by the end user\u27s visual perception skills, conceptual knowledge, and task-oriented goals. Human image understanding is reflected by individuals\u27 visual and linguistic behaviors, but the meaningful computational integration and interpretation of their multimodal representations (e.g. gaze, text) remain a challen...
A photograph typically depicts an aspect of the real world, such as an outdoor landscape, a portrai...
Despite progress in perceptual tasks such as image classification, computers still perform poorly on...
We posit that user behavior during natural viewing of im-ages contains an abundance of information a...
Advanced image-based application systems such as image retrieval and visual question answering depen...
When speakers describe an image, they tend to look at objects before mentioning them. In this paper,...
Texts and images provide alternative, yet orthogonal views of the same underlying cognitive concept....
We present a model that generates natural language de-scriptions of images and their regions. Our ap...
Providing image annotations is a tedious task. This becomes even more cumbersome when objects shall ...
We present a model that generates natural language de-scriptions of images and their regions. Our ap...
Tagged image regions are a valuable meta information which can support users various activities such...
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
We explore the way in which people look at images of different semantic categories (e.g., handshake,...
"This research explores the interaction of textual and photographic information in an integrated tex...
In the visual world paradigm, participants are more likely to fixate a visual referent that has some...
With the development of society, both industry and academia draw increasing attention to multimedia ...
A photograph typically depicts an aspect of the real world, such as an outdoor landscape, a portrai...
Despite progress in perceptual tasks such as image classification, computers still perform poorly on...
We posit that user behavior during natural viewing of im-ages contains an abundance of information a...
Advanced image-based application systems such as image retrieval and visual question answering depen...
When speakers describe an image, they tend to look at objects before mentioning them. In this paper,...
Texts and images provide alternative, yet orthogonal views of the same underlying cognitive concept....
We present a model that generates natural language de-scriptions of images and their regions. Our ap...
Providing image annotations is a tedious task. This becomes even more cumbersome when objects shall ...
We present a model that generates natural language de-scriptions of images and their regions. Our ap...
Tagged image regions are a valuable meta information which can support users various activities such...
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
We explore the way in which people look at images of different semantic categories (e.g., handshake,...
"This research explores the interaction of textual and photographic information in an integrated tex...
In the visual world paradigm, participants are more likely to fixate a visual referent that has some...
With the development of society, both industry and academia draw increasing attention to multimedia ...
A photograph typically depicts an aspect of the real world, such as an outdoor landscape, a portrai...
Despite progress in perceptual tasks such as image classification, computers still perform poorly on...
We posit that user behavior during natural viewing of im-ages contains an abundance of information a...