The objective of this work is to explore how to effectively and efficiently adapt pre-trained visual foundation models to various downstream tasks of semantic segmentation. Previous methods usually fine-tuned the entire networks for each specific dataset, which will be burdensome to store massive parameters of these networks. A few recent works attempted to insert some extra trainable parameters into the frozen networks to learn visual prompts for parameter-efficient tuning. However, these works showed poor generality as they were designed specifically for Transformers. Moreover, using limited information in these schemes, they exhibited a poor capacity to learn beneficial prompts. To alleviate these issues, we propose a novel Stage-wise Pr...
Deep convolutional networks for semantic image segmentation typically require large-scale labeled da...
Convolutional networks are powerful visual models that yield hierarchies of features. We show that c...
27 pages, 9 figures, 11 tablesInternational audienceDeep learning based pipelines for semantic segme...
Recently, CLIP-based approaches have exhibited remarkable performance on generalization and few-shot...
The current modus operandi in adapting pre-trained models involves updating all the backbone paramet...
Prompt tuning has become a new paradigm for model tuning and it has demonstrated success in natural ...
The size of vision models has grown exponentially over the last few years, especially after the emer...
We present a new paradigm for fine-tuning large-scale vision-language pre-trained models on downstre...
Vision transformers (ViTs) encoding an image as a sequence of patches bring new paradigms for semant...
Semantic segmentation is among the most significant applications in computer vision. The goal of sem...
Scene parsing entails interpretation of the visual world in terms of meaningful semantic concepts. A...
State-of-the-art semantic image segmentation methods are mostly based on training deep convolutional...
Automated design of neural network architectures tailored for a specific task is an extremely promis...
We investigate the efficacy of visual prompting to adapt large-scale models in vision. Following the...
Recent efforts in semantic segmentation using deep learning framework have made notable advances. Wh...
Deep convolutional networks for semantic image segmentation typically require large-scale labeled da...
Convolutional networks are powerful visual models that yield hierarchies of features. We show that c...
27 pages, 9 figures, 11 tablesInternational audienceDeep learning based pipelines for semantic segme...
Recently, CLIP-based approaches have exhibited remarkable performance on generalization and few-shot...
The current modus operandi in adapting pre-trained models involves updating all the backbone paramet...
Prompt tuning has become a new paradigm for model tuning and it has demonstrated success in natural ...
The size of vision models has grown exponentially over the last few years, especially after the emer...
We present a new paradigm for fine-tuning large-scale vision-language pre-trained models on downstre...
Vision transformers (ViTs) encoding an image as a sequence of patches bring new paradigms for semant...
Semantic segmentation is among the most significant applications in computer vision. The goal of sem...
Scene parsing entails interpretation of the visual world in terms of meaningful semantic concepts. A...
State-of-the-art semantic image segmentation methods are mostly based on training deep convolutional...
Automated design of neural network architectures tailored for a specific task is an extremely promis...
We investigate the efficacy of visual prompting to adapt large-scale models in vision. Following the...
Recent efforts in semantic segmentation using deep learning framework have made notable advances. Wh...
Deep convolutional networks for semantic image segmentation typically require large-scale labeled da...
Convolutional networks are powerful visual models that yield hierarchies of features. We show that c...
27 pages, 9 figures, 11 tablesInternational audienceDeep learning based pipelines for semantic segme...