In this paper we propose an unsupervised feature extraction method to capture temporal information on monocular videos, where we detect and encode subject of interest in each frame and leverage contrastive self-supervised (CSS) learning to extract rich latent vectors. Instead of simply treating the latent features of nearby frames as positive pairs and those of temporally-distant ones as negative pairs as in other CSS approaches, we explicitly disentangle each latent vector into a time-variant component and a time-invariant one. We then show that applying contrastive loss only to the time-variant features and encouraging a gradual transition on them between nearby and away frames while also reconstructing the input, extract rich temporal fe...
The power of ConvNets has been demonstrated in a wide variety of vision tasks including pose estimat...
This thesis presents new methods in two closely related areas of computer vision: human pose estimat...
Tracking 3D people from monocular video is often poorly constrained. To mitigate this problem, prior...
We address the task of estimating 3D human poses from monocular camera sequences. Many works make us...
Most 3d human pose estimation methods assume that input – be it images of a scene collected from one...
Monocular 3D object detection continues to attract attention due to the cost benefits and wider avai...
We present a bundle-adjustment-based algorithm for recovering accurate 3D human pose and meshes from...
Estimating 3D poses from a monocular video is still a challenging task, despite the significant prog...
Thesis (Ph.D.)--University of Washington, 2014We propose a system to recognize both isolated and con...
We focus on the problem of automatically extracting the 3D configuration of human poses from 2D imag...
There is growing interest in human activity recognition systems, motivated by their numerous promisi...
This work addresses the challenging problem of unconstrained 3D human pose estimation (HPE) from a n...
This paper introduces a novel Pre-trained Spatial Temporal Many-to-One (P-STMO) model for 2D-to-3D h...
We present a novel method for learning human motion models from unsegmented videos. We propose a uni...
Estimation of 3D human pose from monocular image has gained considerable attention, as a key step to...
The power of ConvNets has been demonstrated in a wide variety of vision tasks including pose estimat...
This thesis presents new methods in two closely related areas of computer vision: human pose estimat...
Tracking 3D people from monocular video is often poorly constrained. To mitigate this problem, prior...
We address the task of estimating 3D human poses from monocular camera sequences. Many works make us...
Most 3d human pose estimation methods assume that input – be it images of a scene collected from one...
Monocular 3D object detection continues to attract attention due to the cost benefits and wider avai...
We present a bundle-adjustment-based algorithm for recovering accurate 3D human pose and meshes from...
Estimating 3D poses from a monocular video is still a challenging task, despite the significant prog...
Thesis (Ph.D.)--University of Washington, 2014We propose a system to recognize both isolated and con...
We focus on the problem of automatically extracting the 3D configuration of human poses from 2D imag...
There is growing interest in human activity recognition systems, motivated by their numerous promisi...
This work addresses the challenging problem of unconstrained 3D human pose estimation (HPE) from a n...
This paper introduces a novel Pre-trained Spatial Temporal Many-to-One (P-STMO) model for 2D-to-3D h...
We present a novel method for learning human motion models from unsegmented videos. We propose a uni...
Estimation of 3D human pose from monocular image has gained considerable attention, as a key step to...
The power of ConvNets has been demonstrated in a wide variety of vision tasks including pose estimat...
This thesis presents new methods in two closely related areas of computer vision: human pose estimat...
Tracking 3D people from monocular video is often poorly constrained. To mitigate this problem, prior...