The size of vision models has grown exponentially over the last few years, especially after the emergence of Vision Transformer. This has motivated the development of parameter-efficient tuning methods, such as learning adapter layers or visual prompt tokens, which allow a tiny portion of model parameters to be trained whereas the vast majority obtained from pre-training are frozen. However, designing a proper tuning method is non-trivial: one might need to try out a lengthy list of design choices, not to mention that each downstream dataset often requires custom designs. In this paper, we view the existing parameter-efficient tuning methods as "prompt modules" and propose Neural prOmpt seArcH (NOAH), a novel approach that learns, for large...
Visual search and working memory (WM) are tightly linked cognitive processes. Theories of attentiona...
With the growing demand for vision applications and deployment across edge devices, the development ...
We present prompt distribution learning for effectively adapting a pre-trained vision-language model...
Large pre-trained vision-language models like CLIP have shown great potential in learning representa...
The current modus operandi in adapting pre-trained models involves updating all the backbone paramet...
The successful application of ConvNets and other neural architectures to computer vision is central ...
Neural architecture search (NAS) can have a significant impact in computer vision by automatically d...
The automated architecture search methodology for neural networks is known as Neural Architecture Se...
Neural Architecture Search (NAS), which automates the discovery of efficient neural networks, has de...
Human pose estimation is a challenging computer vision task and often hinges on carefully handcrafte...
We present a new paradigm for fine-tuning large-scale vision-language pre-trained models on downstre...
We investigate the efficacy of visual prompting to adapt large-scale models in vision. Following the...
The 2010s have seen the first large-scale successes of computer vision "in the wild", paving the way...
Artificial intelligence has been an ultimate design goal since the inception of computers decades ag...
A longstanding question in sensory neuroscience is what types of stimuli drive neurons to fire. The ...
Visual search and working memory (WM) are tightly linked cognitive processes. Theories of attentiona...
With the growing demand for vision applications and deployment across edge devices, the development ...
We present prompt distribution learning for effectively adapting a pre-trained vision-language model...
Large pre-trained vision-language models like CLIP have shown great potential in learning representa...
The current modus operandi in adapting pre-trained models involves updating all the backbone paramet...
The successful application of ConvNets and other neural architectures to computer vision is central ...
Neural architecture search (NAS) can have a significant impact in computer vision by automatically d...
The automated architecture search methodology for neural networks is known as Neural Architecture Se...
Neural Architecture Search (NAS), which automates the discovery of efficient neural networks, has de...
Human pose estimation is a challenging computer vision task and often hinges on carefully handcrafte...
We present a new paradigm for fine-tuning large-scale vision-language pre-trained models on downstre...
We investigate the efficacy of visual prompting to adapt large-scale models in vision. Following the...
The 2010s have seen the first large-scale successes of computer vision "in the wild", paving the way...
Artificial intelligence has been an ultimate design goal since the inception of computers decades ag...
A longstanding question in sensory neuroscience is what types of stimuli drive neurons to fire. The ...
Visual search and working memory (WM) are tightly linked cognitive processes. Theories of attentiona...
With the growing demand for vision applications and deployment across edge devices, the development ...
We present prompt distribution learning for effectively adapting a pre-trained vision-language model...