Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as is standard in language modeling. We propose Self-conditioned Embedding Diffusion, a continuous diffusion mechanism that operates on token embeddings and allows to learn flexible and scalable diffusion models for both conditional and unconditional text generation. Through qualitative and quantitative evaluation, we show that our text diffusion models generate samples comparable with those produced by standard autoregressive language models - while being in theory more efficient on accelerator hardware at ...
Text-to-image diffusion models have recently received a lot of interest for their astonishing abilit...
Diffusion models have recently shown great promise for generative modeling, outperforming GANs on pe...
Conditioned diffusion models have demonstrated state-of-the-art text-to-image synthesis capacity. Re...
Diffusion models have garnered considerable interest in the field of text generation. Several studie...
Text-conditioned image generation models have recently shown immense qualitative success using denoi...
Controlling the behavior of language models (LMs) without re-training is a major open problem in nat...
The recent surge of generative AI has been fueled by the generative power of diffusion probabilistic...
Training diffusion models on limited datasets poses challenges in terms of limited generation capaci...
Can a text-to-image diffusion model be used as a training objective for adapting a GAN generator to ...
With the spread of the use of Text2Img diffusion models such as DALL-E 2, Imagen, Mid Journey and St...
Diffusion models have emerged as powerful generative models in the text-to-image domain. This paper ...
Image captioning task has been extensively researched by previous work. However, limited experiments...
Denoising diffusion models (DDMs) have been drawing much attention for their appreciable sample qual...
Text-to-motion generation is a formidable task, aiming to produce human motions that align with the ...
Generative image synthesis with diffusion models has recently achieved excellent visual quality in s...
Text-to-image diffusion models have recently received a lot of interest for their astonishing abilit...
Diffusion models have recently shown great promise for generative modeling, outperforming GANs on pe...
Conditioned diffusion models have demonstrated state-of-the-art text-to-image synthesis capacity. Re...
Diffusion models have garnered considerable interest in the field of text generation. Several studie...
Text-conditioned image generation models have recently shown immense qualitative success using denoi...
Controlling the behavior of language models (LMs) without re-training is a major open problem in nat...
The recent surge of generative AI has been fueled by the generative power of diffusion probabilistic...
Training diffusion models on limited datasets poses challenges in terms of limited generation capaci...
Can a text-to-image diffusion model be used as a training objective for adapting a GAN generator to ...
With the spread of the use of Text2Img diffusion models such as DALL-E 2, Imagen, Mid Journey and St...
Diffusion models have emerged as powerful generative models in the text-to-image domain. This paper ...
Image captioning task has been extensively researched by previous work. However, limited experiments...
Denoising diffusion models (DDMs) have been drawing much attention for their appreciable sample qual...
Text-to-motion generation is a formidable task, aiming to produce human motions that align with the ...
Generative image synthesis with diffusion models has recently achieved excellent visual quality in s...
Text-to-image diffusion models have recently received a lot of interest for their astonishing abilit...
Diffusion models have recently shown great promise for generative modeling, outperforming GANs on pe...
Conditioned diffusion models have demonstrated state-of-the-art text-to-image synthesis capacity. Re...