open6siIn this paper we evaluate three state-of-the-art neural-network-based approaches for large-scale video classification, where the computational efficiency of the inference step is of particular importance due to the ever increasing amount of data throughput for video streams. Our evaluation focuses on finding good efficiency vs. accuracy tradeoffs by evaluating different network configurations and parameterizations. In particular, we investigate the use of different temporal subsampling strategies, and show that they can be used to effectively trade computational workload against classification accuracy. Using a subset of the YouTube-8M dataset, we demonstrate that workload reductions in the order of 10x, 50x and 100x can be achieved ...
Classifying videos into distinct categories, such as Sport and Music Video, is crucial for multimedi...
Rank pooling is a temporal encoding method that summarizes the dynamics of a video sequence to a sin...
This work addresses the problem of accurate semantic labelling of short videos. To this end, a mult...
open6siIn this paper we evaluate three state-of-the-art neural-network-based approaches for large-sc...
Convolutional Neural Networks (CNNs) have been es-tablished as a powerful class of models for image ...
Current deep learning based video classification architectures are typically trained end-to-end on l...
Advanced video classification systems decode video frames to derive the necessary texture and motion...
We address the problem of capturing temporal information for video classification in 2D networks, wi...
Video understanding involves problems such as video classification, which consists in labeling video...
Technological innovation in the field of video action recognition drives the development of video-ba...
Despite their great predictive capability, Convolutional Neural Networks (CNNs) are computational-ex...
Advanced video classification systems decode video frames to derive the necessary texture and motion...
This paper presents an analysis of the data and compute efficiency of the TemporalMaxer deep learnin...
In temporal action localization, given an input video, the goal is to predict which actions it conta...
In this dissertation, I present my work towards exploring temporal information for better video unde...
Classifying videos into distinct categories, such as Sport and Music Video, is crucial for multimedi...
Rank pooling is a temporal encoding method that summarizes the dynamics of a video sequence to a sin...
This work addresses the problem of accurate semantic labelling of short videos. To this end, a mult...
open6siIn this paper we evaluate three state-of-the-art neural-network-based approaches for large-sc...
Convolutional Neural Networks (CNNs) have been es-tablished as a powerful class of models for image ...
Current deep learning based video classification architectures are typically trained end-to-end on l...
Advanced video classification systems decode video frames to derive the necessary texture and motion...
We address the problem of capturing temporal information for video classification in 2D networks, wi...
Video understanding involves problems such as video classification, which consists in labeling video...
Technological innovation in the field of video action recognition drives the development of video-ba...
Despite their great predictive capability, Convolutional Neural Networks (CNNs) are computational-ex...
Advanced video classification systems decode video frames to derive the necessary texture and motion...
This paper presents an analysis of the data and compute efficiency of the TemporalMaxer deep learnin...
In temporal action localization, given an input video, the goal is to predict which actions it conta...
In this dissertation, I present my work towards exploring temporal information for better video unde...
Classifying videos into distinct categories, such as Sport and Music Video, is crucial for multimedi...
Rank pooling is a temporal encoding method that summarizes the dynamics of a video sequence to a sin...
This work addresses the problem of accurate semantic labelling of short videos. To this end, a mult...