This paper addresses the problem of exploiting spatiotemporal information to improve small object detection precision in video. We propose a two-stage object detector called FANet based on short-term spatiotemporal feature aggregation and long-term object linking to refine object detections. First, we generate a set of short tubelet proposals. Then, we aggregate RoI pooled deep features throughout the tubelet using a new temporal pooling operator that summarizes the information with a fixed output size independent of the tubelet length. In addition, we define a double head implementation that we feed with spatiotemporal information for spatiotemporal classification and with spatial information for object localization and spatial classificat...
The development of the Internet makes the number of online videos increase dramatically, which bring...
Inspired by the recent spatio-temporal action localization efforts with tubelets (sequences of bound...
We present a novel spatiotemporal saliency model for object detection in videos. In contrast to prev...
We present a new network architecture able to take advantage of spatio-temporal information availabl...
Object detection through convolutional neural networks is reaching unprecedented levels of precision...
International audienceCurrent state-of-the-art approaches for spatio-temporal action detection rely ...
Abstract. Spatio-temporal detection of actions and events in video is a challeng-ing problem. Beside...
The rise of deep learning has facilitated remarkable progress in video understanding. This thesis ad...
The object detection problem is composed of two main tasks, object localization and object classifi...
With the development of deep neural networks, many object detection frameworks have shown great succ...
Video object detection is a challenging task because of the presence of appearance deterioration in ...
In this work, we propose an approach to the spatiotemporal localisation (detection) and classificati...
We provide a set of generic modifications to improve the execution efficiency of single-shot object ...
Video object co-localization is the task of jointly localizing common visual objects across videos. ...
Recent approaches for high accuracy detection and tracking of object categories in video consist of ...
The development of the Internet makes the number of online videos increase dramatically, which bring...
Inspired by the recent spatio-temporal action localization efforts with tubelets (sequences of bound...
We present a novel spatiotemporal saliency model for object detection in videos. In contrast to prev...
We present a new network architecture able to take advantage of spatio-temporal information availabl...
Object detection through convolutional neural networks is reaching unprecedented levels of precision...
International audienceCurrent state-of-the-art approaches for spatio-temporal action detection rely ...
Abstract. Spatio-temporal detection of actions and events in video is a challeng-ing problem. Beside...
The rise of deep learning has facilitated remarkable progress in video understanding. This thesis ad...
The object detection problem is composed of two main tasks, object localization and object classifi...
With the development of deep neural networks, many object detection frameworks have shown great succ...
Video object detection is a challenging task because of the presence of appearance deterioration in ...
In this work, we propose an approach to the spatiotemporal localisation (detection) and classificati...
We provide a set of generic modifications to improve the execution efficiency of single-shot object ...
Video object co-localization is the task of jointly localizing common visual objects across videos. ...
Recent approaches for high accuracy detection and tracking of object categories in video consist of ...
The development of the Internet makes the number of online videos increase dramatically, which bring...
Inspired by the recent spatio-temporal action localization efforts with tubelets (sequences of bound...
We present a novel spatiotemporal saliency model for object detection in videos. In contrast to prev...