Recently, remarkable advances have been achieved in 3D human pose estimation from monocular images because of the powerful Deep Convolutional Neural Networks (DC- NNs). Despite their success on large-scale datasets col- lected in the constrained lab
We introduce new, fine-grained action and emotion recognition tasks defined on non-staged videos, recorded during robot-assisted therapy sessions of children with autism. The tasks present several challenges: a large dataset with long videos, a larg
We develop a 3D object detection algorithm that uses latent support surfaces to capture contextual relationships in indoor scenes. Existing 3D representations for RGB-D images capture the local shape and appearance of object categories, but have lim
We propose a scalable, efficient and accurate approach to retrieve 3D models for objects in the wild. Our contri- bution is twofold. We first present a 3D pose estimation approach for object categories which significantly outper- forms the state-of-
Observing that Semantic features learned in an image classification task and Appearance features learned in a similarity matching task complement each other, we build a twofold Siamese network, named SA-Siam, for real-time object tracking. SA-Siam i
This paper addresses the problem of video object segmentation, where the initial object mask is given in the first frame of an input video. We propose a novel spatiotemporal Markov Random Field (MRF) model defined over pixels to handle this problem.
We present a new compact but dense representation of scene geometry which is conditioned on the intensity data from a single image and generated from a code consisting of a small number of parameters. We are inspired by work both on learned depth fr