In this work, we leverage estimated depth to boost self-supervised contrastive learning for segmentation of urban scenes, where unlabeled videos are readily available for training self-supervised depth estimation. We argue that the semantics of a coherent group of pixels in 3D space is self-contained and invariant to the contexts in which they appear. We group coherent, semantically related pixels into coherent depth regions given their estimated depth and use copy-paste to synthetically vary their contexts. In this way, cross-context correspondences are built in contrastive learning and a context-invariant representation is learned. For unsupervised semantic segmentation of urban scenes, our method surpasses the previous state-of-the-art b...
Semi-supervised semantic segmentation focuses on the exploration of a small amount of labeled data a...
Self-supervised monocular depth estimation has shown impressive results in static scenes. It relies ...
Recent progress in contrastive learning has revolutionized unsupervised representation learn...
Self-supervised contrastive learning has achieved remarkable performance in computer vision. Its suc...
Traditionally, training neural networks to perform semantic segmentation required expensive human-ma...
Recent self-supervised models have demonstrated equal or better performance than supervised methods,...
Current semantic segmentation methods focus only on mining “local” context, i.e., dependencies betwe...
This work considers supervised contrastive learning for semantic segmentation. We apply contrastive ...
Although recent semantic segmentation methods have made remarkable progress, they still rely on larg...
Monocular depth estimation using novel learning-based approaches has recently emerged as a promisin...
Although recent semantic segmentation methods have made remarkable progress, they still rely on larg...
Most approaches for semantic segmentation use only information from color cameras to parse the scene...
In this paper, we tackle the problem of learning visual representations from unlabeled scene-centric...
In this paper, we present a deep learning model that exploits the power of self-supervision to perfo...
Image segmentation and depth estimation are crucial tasks in computer vision, especially in autonomo...
Semi-supervised semantic segmentation focuses on the exploration of a small amount of labeled data a...
Self-supervised monocular depth estimation has shown impressive results in static scenes. It relies ...
Recent progress in contrastive learning has revolutionized unsupervised representation learn...
Self-supervised contrastive learning has achieved remarkable performance in computer vision. Its suc...
Traditionally, training neural networks to perform semantic segmentation required expensive human-ma...
Recent self-supervised models have demonstrated equal or better performance than supervised methods,...
Current semantic segmentation methods focus only on mining “local” context, i.e., dependencies betwe...
This work considers supervised contrastive learning for semantic segmentation. We apply contrastive ...
Although recent semantic segmentation methods have made remarkable progress, they still rely on larg...
Monocular depth estimation using novel learning-based approaches has recently emerged as a promisin...
Although recent semantic segmentation methods have made remarkable progress, they still rely on larg...
Most approaches for semantic segmentation use only information from color cameras to parse the scene...
In this paper, we tackle the problem of learning visual representations from unlabeled scene-centric...
In this paper, we present a deep learning model that exploits the power of self-supervision to perfo...
Image segmentation and depth estimation are crucial tasks in computer vision, especially in autonomo...
Semi-supervised semantic segmentation focuses on the exploration of a small amount of labeled data a...
Self-supervised monocular depth estimation has shown impressive results in static scenes. It relies ...
Recent progress in contrastive learning has revolutionized unsupervised representation learn...