Score-based diffusion models have captured widespread attention and funded fast progress of recent vision generative tasks. In this paper, we focus on diffusion model backbone which has been much neglected before. We systematically explore vision Transformers as diffusion learners for various generative tasks. With our improvements the performance of vanilla ViT-based backbone (IU-ViT) is boosted to be on par with traditional U-Net-based methods. We further provide a hypothesis on the implication of disentangling the generative backbone as an encoder-decoder structure and show proof-of-concept experiments verifying the effectiveness of a stronger encoder for generative tasks with ASymmetriC ENcoder Decoder (ASCEND). Our improvements achieve...
In the past few years, transformers have achieved promising performances on various computer vision ...
This paper investigates two techniques for developing efficient self-supervised vision transformers ...
Can a text-to-image diffusion model be used as a training objective for adapting a GAN generator to ...
The excellent generative capabilities of text-to-image diffusion models suggest they learn informati...
Deep learning shows excellent potential in generation tasks thanks to deep latent representation. Ge...
Text-conditioned image generation models have recently shown immense qualitative success using denoi...
Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tack...
The field of visual computing is rapidly advancing due to the emergence of generative artificial int...
Attention-based neural networks such as the Vision Transformer (ViT) have recently attained state-of...
Generative image synthesis with diffusion models has recently achieved excellent visual quality in s...
The transformer models have shown promising effectiveness in dealing with various vision tasks. Howe...
The recent success of transformer-based image generative models in object-centric learning highlight...
Vision transformers (ViTs) have recently obtained success in many applications, but their intensive ...
In this work, we investigate a simple and must-known conditional generative framework based on Vecto...
Transformer-based neural models are used in many AI applications. Training these models is expensive...
In the past few years, transformers have achieved promising performances on various computer vision ...
This paper investigates two techniques for developing efficient self-supervised vision transformers ...
Can a text-to-image diffusion model be used as a training objective for adapting a GAN generator to ...
The excellent generative capabilities of text-to-image diffusion models suggest they learn informati...
Deep learning shows excellent potential in generation tasks thanks to deep latent representation. Ge...
Text-conditioned image generation models have recently shown immense qualitative success using denoi...
Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tack...
The field of visual computing is rapidly advancing due to the emergence of generative artificial int...
Attention-based neural networks such as the Vision Transformer (ViT) have recently attained state-of...
Generative image synthesis with diffusion models has recently achieved excellent visual quality in s...
The transformer models have shown promising effectiveness in dealing with various vision tasks. Howe...
The recent success of transformer-based image generative models in object-centric learning highlight...
Vision transformers (ViTs) have recently obtained success in many applications, but their intensive ...
In this work, we investigate a simple and must-known conditional generative framework based on Vecto...
Transformer-based neural models are used in many AI applications. Training these models is expensive...
In the past few years, transformers have achieved promising performances on various computer vision ...
This paper investigates two techniques for developing efficient self-supervised vision transformers ...
Can a text-to-image diffusion model be used as a training objective for adapting a GAN generator to ...