Advanced image-based application systems such as image retrieval and visual question answering depend heavily on semantic image region annotation. However, improvements in image region annotation are limited because of our inability to understand how humans, the end users, process these images and image regions. In this work, we expand a framework for capturing image region annotations where interpreting an image is influenced by the end user\u27s visual perception skills, conceptual knowledge, and task-oriented goals. Human image understanding is reflected by individuals\u27 visual and linguistic behaviors, but the meaningful computational integration and interpretation of their multimodal representations (e.g. gaze, text) remain a challen...
This thesis investigates the mechanisms underlying the formation, maintenance, and sharing of refere...
"This research explores the interaction of textual and photographic information in an integrated tex...
Powered by deep convolutional networks and large scale visual datasets, modern computer vision syste...
Advanced image-based application systems such as image retrieval and visual question answering depen...
When speakers describe an image, they tend to look at objects before mentioning them. In this paper,...
Texts and images provide alternative, yet orthogonal views of the same underlying cognitive concept....
We present a model that generates natural language de-scriptions of images and their regions. Our ap...
We present a model that generates natural language de-scriptions of images and their regions. Our ap...
Providing image annotations is a tedious task. This becomes even more cumbersome when objects shall ...
Tagged image regions are a valuable meta information which can support users various activities such...
Linguistic processing in the context of a visual scene triggers characteristic eye-movement response...
We posit that user behavior during natural viewing of im-ages contains an abundance of information a...
In describing images, visual and linguistic processes coordinate with each other and they both proce...
Visual attention mechanism has been widely used by image captioning model in order to dynamically at...
This research explores the interaction of linguistic and photographic information in an integrated t...
This thesis investigates the mechanisms underlying the formation, maintenance, and sharing of refere...
"This research explores the interaction of textual and photographic information in an integrated tex...
Powered by deep convolutional networks and large scale visual datasets, modern computer vision syste...
Advanced image-based application systems such as image retrieval and visual question answering depen...
When speakers describe an image, they tend to look at objects before mentioning them. In this paper,...
Texts and images provide alternative, yet orthogonal views of the same underlying cognitive concept....
We present a model that generates natural language de-scriptions of images and their regions. Our ap...
We present a model that generates natural language de-scriptions of images and their regions. Our ap...
Providing image annotations is a tedious task. This becomes even more cumbersome when objects shall ...
Tagged image regions are a valuable meta information which can support users various activities such...
Linguistic processing in the context of a visual scene triggers characteristic eye-movement response...
We posit that user behavior during natural viewing of im-ages contains an abundance of information a...
In describing images, visual and linguistic processes coordinate with each other and they both proce...
Visual attention mechanism has been widely used by image captioning model in order to dynamically at...
This research explores the interaction of linguistic and photographic information in an integrated t...
This thesis investigates the mechanisms underlying the formation, maintenance, and sharing of refere...
"This research explores the interaction of textual and photographic information in an integrated tex...
Powered by deep convolutional networks and large scale visual datasets, modern computer vision syste...