Complex video analysis is a challenging problem due to the long and sophisticated temporal structure of unconstrained videos. This paper introduces pooled-feature representation (PFR) which is derived from a double layer encoding framework (DLE) to address this problem. Considering that a complex video is composed of a sequence of simple frames, the first layer generates temporal sub-volumes from the video and represents them individually. The second layer constructs the pool of features by fusing the represented vectors from the first layer. The pool is compressed and then encoded to provide video-parts vector (VPV). This framework allows distilling the representation and extracting new information in a hierarchical way. Compared with rece...
Human action recognition plays a crucial role in visual learning applications such as video understa...
In this report, we first present a general framework for video structure and content analysis. In th...
In this paper, we tackle the problem of combining fea-tures extracted from video for complex event r...
Recently, newly invented features (e.g. Fisher vector, VLAD) have achieved state-of-the-art performa...
Common video representations often deploy an average or maximum pooling of pre-extracted frame featu...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
This paper suggests the idea to model video information as a concatenation of different recurring so...
Witnessing the omnipresence of ever complex yet so intuitive digital video media, research community...
Abstract — We address the problem of video face retrieval in TV-Series, which searches video clips b...
Witnessing the omnipresence of digital video media, the research community has raised the question o...
Feature ranking from video-wide temporal evolution brings reliable information for complex action re...
Video data exhibits a variety of structures: pixels exhibit spatial structure, e.g., the same class ...
Recognition of complex events in consumer uploaded Internet videos, captured under realworld setting...
Abstract. Real-world videos often contain dynamic backgrounds and evolving people activities, especi...
In this thesis, we investigate different representations and models for large-scale video understand...
Human action recognition plays a crucial role in visual learning applications such as video understa...
In this report, we first present a general framework for video structure and content analysis. In th...
In this paper, we tackle the problem of combining fea-tures extracted from video for complex event r...
Recently, newly invented features (e.g. Fisher vector, VLAD) have achieved state-of-the-art performa...
Common video representations often deploy an average or maximum pooling of pre-extracted frame featu...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
This paper suggests the idea to model video information as a concatenation of different recurring so...
Witnessing the omnipresence of ever complex yet so intuitive digital video media, research community...
Abstract — We address the problem of video face retrieval in TV-Series, which searches video clips b...
Witnessing the omnipresence of digital video media, the research community has raised the question o...
Feature ranking from video-wide temporal evolution brings reliable information for complex action re...
Video data exhibits a variety of structures: pixels exhibit spatial structure, e.g., the same class ...
Recognition of complex events in consumer uploaded Internet videos, captured under realworld setting...
Abstract. Real-world videos often contain dynamic backgrounds and evolving people activities, especi...
In this thesis, we investigate different representations and models for large-scale video understand...
Human action recognition plays a crucial role in visual learning applications such as video understa...
In this report, we first present a general framework for video structure and content analysis. In th...
In this paper, we tackle the problem of combining fea-tures extracted from video for complex event r...