This dissertation addresses two significant challenges of large language models (LLMs): robustness and scalability. Firstly, we focus on improving large language model robustness through the lens of learning code representations. I highlight our work on ContraCode which learns representations of code that are robust to label-preserving edits. Secondly, we tackle scalability challenges from a systems perspective. We present Checkmate, a system to support training models beyond GPU memory capacity limits through optimal rematerialization. Furthermore, Skyplane, a system that optimizes bulk data transfers between cloud object stores, enables training models on larger pre-training datasets in the cloud. Together, these contributions present a r...
The crystallization of modeling methods around the Transformer architecture has been a boon for prac...
Large Language Models (LLMs) play an ever-increasing role in the field of Artificial Intelligence (A...
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstr...
Large language models (LMs) of code have recently shown tremendous promise in completing code and sy...
This paper reports on the benefits of largescale statistical language modeling in machine translatio...
Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional com...
Thesis (Ph.D.)--University of Washington, 2023Language models (LMs) are at the core of almost all st...
Language models demonstrate both quantitative improvement and new qualitative capabilities with incr...
Machine-learning models can reach very high performance with supervised training, where they learn f...
Thesis (Ph.D.)--University of Washington, 2019Data, models, and computing are the three pillars that...
Robustness of a model plays a vital role in large scale machine learning. Classical estimators in ro...
In this research project, we aim to explore the use of Large Language Models (LLMs) as an interface ...
In the rapidly advancing field of large language model (LLM) research, platforms like Stack Overflow...
Despite their wide adoption, the underlying training and memorization dynamics of very large languag...
Large language models (LLMs)—machine learning algorithms that can recognize, summarize, translate,...
The crystallization of modeling methods around the Transformer architecture has been a boon for prac...
Large Language Models (LLMs) play an ever-increasing role in the field of Artificial Intelligence (A...
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstr...
Large language models (LMs) of code have recently shown tremendous promise in completing code and sy...
This paper reports on the benefits of largescale statistical language modeling in machine translatio...
Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional com...
Thesis (Ph.D.)--University of Washington, 2023Language models (LMs) are at the core of almost all st...
Language models demonstrate both quantitative improvement and new qualitative capabilities with incr...
Machine-learning models can reach very high performance with supervised training, where they learn f...
Thesis (Ph.D.)--University of Washington, 2019Data, models, and computing are the three pillars that...
Robustness of a model plays a vital role in large scale machine learning. Classical estimators in ro...
In this research project, we aim to explore the use of Large Language Models (LLMs) as an interface ...
In the rapidly advancing field of large language model (LLM) research, platforms like Stack Overflow...
Despite their wide adoption, the underlying training and memorization dynamics of very large languag...
Large language models (LLMs)—machine learning algorithms that can recognize, summarize, translate,...
The crystallization of modeling methods around the Transformer architecture has been a boon for prac...
Large Language Models (LLMs) play an ever-increasing role in the field of Artificial Intelligence (A...
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstr...