The computer vision community has been long focusing on classic tasks such as object detection, human attributes classification, action recognition. While the state-of-the-art performance is getting improved every year for a wide range of tasks, it remains a challenge to organize individual pieces into an integral system that parses visual scenes and events jointly. In this dissertation, we explore the problem of joint visual scene parsing in a restricted visual Turing test scenario that encourages explicit concept grounding. The goal is to build a scalable computer vision system that leverages the advancement of individual modules in various tasks and exploits the inherent correlation and constraints between them for a comprehensive unders...
It is a challenging yet crucial task to have a comprehensive understanding of human activities and e...
Powered by deep convolutional networks and large scale visual datasets, modern computer vision syste...
154 pagesOver the course of the last decades, we have witnessed the significant progress of machine ...
Cross-view video understanding is an important yet under-explored area in computer vision. In this p...
Scene graph parsing aims at understanding an image as a graph where vertices are visual objects (pot...
One of the long-standing problems in artificial intelligence is the development of intelligent agent...
People typically learn through exposure to visual facts associated with linguistic descriptions. For...
Computer Vision has undergone major changes over the recent five years. Here, we investigate if the ...
A video captures a sequence and interactions of concepts that can be static, for instance, objects o...
Understanding images requires rich background knowledge that is not often written down and hard for ...
Computer vision has made significant progress in locating and recognizing objects in recent decades....
One of the major research topics in computer vision is automatic video scene understanding where the...
Humans can extract rich information from visual scenes, such as the 3D locations of objects and huma...
A video captures a sequence and interactions of concepts that can be static, for instance, objects o...
In computer vision, scene parsing is the problem of labelling every pixel in an image or video with ...
It is a challenging yet crucial task to have a comprehensive understanding of human activities and e...
Powered by deep convolutional networks and large scale visual datasets, modern computer vision syste...
154 pagesOver the course of the last decades, we have witnessed the significant progress of machine ...
Cross-view video understanding is an important yet under-explored area in computer vision. In this p...
Scene graph parsing aims at understanding an image as a graph where vertices are visual objects (pot...
One of the long-standing problems in artificial intelligence is the development of intelligent agent...
People typically learn through exposure to visual facts associated with linguistic descriptions. For...
Computer Vision has undergone major changes over the recent five years. Here, we investigate if the ...
A video captures a sequence and interactions of concepts that can be static, for instance, objects o...
Understanding images requires rich background knowledge that is not often written down and hard for ...
Computer vision has made significant progress in locating and recognizing objects in recent decades....
One of the major research topics in computer vision is automatic video scene understanding where the...
Humans can extract rich information from visual scenes, such as the 3D locations of objects and huma...
A video captures a sequence and interactions of concepts that can be static, for instance, objects o...
In computer vision, scene parsing is the problem of labelling every pixel in an image or video with ...
It is a challenging yet crucial task to have a comprehensive understanding of human activities and e...
Powered by deep convolutional networks and large scale visual datasets, modern computer vision syste...
154 pagesOver the course of the last decades, we have witnessed the significant progress of machine ...