A defining characteristic of natural vision is its ability to withstand a variety of input alterations, resulting in the creation of an invariant representation of the surroundings. While convolutional neural networks exhibit resilience to certain forms of spatial input variation, modifications in the spatial and temporal aspects can significantly affect the representations of video content in deep neural networks. Inspired by the resilience of natural vision to input variations, we employ a simple multi-stream model to explore its potential to address spatiotemporal changes by including temporal features. Our primary goal is to introduce a video-trained model and evaluate its robustness to diverse image and video inputs, with a particular ...
While recent large-scale video-language pre-training made great progress in video question answering...
Modeling the visual changes that an action brings to a scene is critical for video understanding. Cu...
The tremendous growth in video data, both on the internet and in real life, has encouraged the devel...
abstract: Video analysis and understanding have obtained more and more attention in recent years. Th...
Video understanding is one of the fundamental problems in computer vision. Videos provide more infor...
We propose MASTAF, a Model-Agnostic Spatio-Temporal Attention Fusion network for few-shot video clas...
We introduce Continual 3D Convolutional Neural Networks (Co3D CNNs), a new computational formulation...
International audiencePrediction of visual saliency in images and video is a highly researched topic...
Deep spatiotemporal models are used in a variety of computer vision tasks, such as action recognitio...
Throughout our life, we humans perceive the visual world, connect what we see over time, and make se...
In this dissertation, I present my work towards exploring temporal information for better video unde...
Visual enhancement is concerned with problems to improve the visual quality and viewing experience f...
Deep neural network representations align well with brain activity in the ventral visual stream. How...
Classification of human actions from real-world video data is one of the most important topics in co...
Deep feedforward neural network models of vision dominate in both computational neuroscience and eng...
While recent large-scale video-language pre-training made great progress in video question answering...
Modeling the visual changes that an action brings to a scene is critical for video understanding. Cu...
The tremendous growth in video data, both on the internet and in real life, has encouraged the devel...
abstract: Video analysis and understanding have obtained more and more attention in recent years. Th...
Video understanding is one of the fundamental problems in computer vision. Videos provide more infor...
We propose MASTAF, a Model-Agnostic Spatio-Temporal Attention Fusion network for few-shot video clas...
We introduce Continual 3D Convolutional Neural Networks (Co3D CNNs), a new computational formulation...
International audiencePrediction of visual saliency in images and video is a highly researched topic...
Deep spatiotemporal models are used in a variety of computer vision tasks, such as action recognitio...
Throughout our life, we humans perceive the visual world, connect what we see over time, and make se...
In this dissertation, I present my work towards exploring temporal information for better video unde...
Visual enhancement is concerned with problems to improve the visual quality and viewing experience f...
Deep neural network representations align well with brain activity in the ventral visual stream. How...
Classification of human actions from real-world video data is one of the most important topics in co...
Deep feedforward neural network models of vision dominate in both computational neuroscience and eng...
While recent large-scale video-language pre-training made great progress in video question answering...
Modeling the visual changes that an action brings to a scene is critical for video understanding. Cu...
The tremendous growth in video data, both on the internet and in real life, has encouraged the devel...