The recent surge of generative AI has been fueled by the generative power of diffusion probabilistic models and the scalable capabilities of large language models. Despite their potential, it remains elusive whether diffusion language models can solve general language tasks comparable to their autoregressive counterparts. This paper demonstrates that scaling diffusion models w.r.t. data, sizes, and tasks can effectively make them strong language learners. We build competent diffusion language models at scale by first acquiring knowledge from massive data via masked language modeling pretraining thanks to their intrinsic connections. We then reprogram pretrained masked language models into diffusion language models via diffusive adaptation, ...
Recent trends in language modeling have focused on increasing performance through scaling, and have ...
Thesis (Ph.D.)--University of Washington, 2023Language models (LMs) are at the core of almost all st...
Language model fine-tuning is essential for modern natural language processing, but is computational...
Deep learning shows excellent potential in generation tasks thanks to deep latent representation. Ge...
Controlling the behavior of language models (LMs) without re-training is a major open problem in nat...
Scaling language models with more data, compute and parameters has driven significant progress in na...
Can continuous diffusion models bring the same performance breakthrough on natural language they did...
Training diffusion models on limited datasets poses challenges in terms of limited generation capaci...
Text-conditioned image generation models have recently shown immense qualitative success using denoi...
When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown e...
Deploying large language models (LLMs) is challenging because they are memory inefficient and comput...
Language models demonstrate both quantitative improvement and new qualitative capabilities with incr...
Large language models have exhibited emergent abilities, demonstrating exceptional performance acros...
Deep generative models produce data according to a learned representation, e.g. diffusion models, th...
Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnabl...
Recent trends in language modeling have focused on increasing performance through scaling, and have ...
Thesis (Ph.D.)--University of Washington, 2023Language models (LMs) are at the core of almost all st...
Language model fine-tuning is essential for modern natural language processing, but is computational...
Deep learning shows excellent potential in generation tasks thanks to deep latent representation. Ge...
Controlling the behavior of language models (LMs) without re-training is a major open problem in nat...
Scaling language models with more data, compute and parameters has driven significant progress in na...
Can continuous diffusion models bring the same performance breakthrough on natural language they did...
Training diffusion models on limited datasets poses challenges in terms of limited generation capaci...
Text-conditioned image generation models have recently shown immense qualitative success using denoi...
When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown e...
Deploying large language models (LLMs) is challenging because they are memory inefficient and comput...
Language models demonstrate both quantitative improvement and new qualitative capabilities with incr...
Large language models have exhibited emergent abilities, demonstrating exceptional performance acros...
Deep generative models produce data according to a learned representation, e.g. diffusion models, th...
Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnabl...
Recent trends in language modeling have focused on increasing performance through scaling, and have ...
Thesis (Ph.D.)--University of Washington, 2023Language models (LMs) are at the core of almost all st...
Language model fine-tuning is essential for modern natural language processing, but is computational...