Zero-shot semantic segmentation (ZS3) aims to segment the novel categoriesthat have not been seen in the training. Existing works formulate ZS3 as apixel-level zero-shot classification problem, and transfer semantic knowledgefrom seen classes to unseen ones with the help of language models pre-trainedonly with texts. While simple, the pixel-level ZS3 formulation shows thelimited capability to integrate vision-language models that are oftenpre-trained with image-text pairs and currently demonstrate great potential forvision tasks. Inspired by the observation that humans often performsegment-level semantic labeling, we propose to decouple the ZS3 into twosub-tasks: 1) a class-agnostic grouping task to group the pixels into segments.2) a zero-...
Zero-shot semantic segmentation aims to recognize the semantics of pixels from unseen categories wit...
Being able to segment unseen classes not observed during training is an important technical challeng...
CLIP has enabled new and exciting joint vision-language applications, one of which is open-vocabular...
International audienceSemantic segmentation models are limited in their ability to scale to large nu...
Recently, CLIP has been applied to pixel-level zero-shot learning tasks via a two-stage scheme. The ...
To bridge the gap between supervised semantic segmentation and real-world applications that acquires...
Image classification is one of the essential tasks for the intelligent visual system. Conventional i...
Semantic segmentation is one of the most fundamental problems in computer vision and pixel-level lab...
National audienceThis paper addresses the task of learning an image clas-sifier when some categories...
International audienceThis paper addresses the task of learning an image clas-sifier when some categ...
Human beings have the remarkable ability to recognize novel visual objects only based on the descrip...
Large-scale pre-trained Vision Language Models (VLMs) have proven effective for zero-shot classifica...
Zero-shot learning (ZSL) is widely studied in recent years to solve the problem of lacking annotatio...
Zero-shot learning (ZSL) aims to recognize unseen image categories by learning an embedding space be...
Zero-Shot Learning (ZSL) aims at recognizing unseen classes that are absent during the training stag...
Zero-shot semantic segmentation aims to recognize the semantics of pixels from unseen categories wit...
Being able to segment unseen classes not observed during training is an important technical challeng...
CLIP has enabled new and exciting joint vision-language applications, one of which is open-vocabular...
International audienceSemantic segmentation models are limited in their ability to scale to large nu...
Recently, CLIP has been applied to pixel-level zero-shot learning tasks via a two-stage scheme. The ...
To bridge the gap between supervised semantic segmentation and real-world applications that acquires...
Image classification is one of the essential tasks for the intelligent visual system. Conventional i...
Semantic segmentation is one of the most fundamental problems in computer vision and pixel-level lab...
National audienceThis paper addresses the task of learning an image clas-sifier when some categories...
International audienceThis paper addresses the task of learning an image clas-sifier when some categ...
Human beings have the remarkable ability to recognize novel visual objects only based on the descrip...
Large-scale pre-trained Vision Language Models (VLMs) have proven effective for zero-shot classifica...
Zero-shot learning (ZSL) is widely studied in recent years to solve the problem of lacking annotatio...
Zero-shot learning (ZSL) aims to recognize unseen image categories by learning an embedding space be...
Zero-Shot Learning (ZSL) aims at recognizing unseen classes that are absent during the training stag...
Zero-shot semantic segmentation aims to recognize the semantics of pixels from unseen categories wit...
Being able to segment unseen classes not observed during training is an important technical challeng...
CLIP has enabled new and exciting joint vision-language applications, one of which is open-vocabular...