Transformers have emerged as the state of the art neural network architecture for natural language processing and computer vision. In the foundation model paradigm, large transformer models (BERT, GPT3/4, Bloom, ViT) are pre-trained on self-supervised tasks such as word or image masking, and then, adapted through fine-tuning for downstream user applications including instruction following and Question Answering. While many approaches have been developed for model fine-tuning including low-rank weight update strategies (eg. LoRA), underlying mathematical principles that enable network adaptation without knowledge loss remain poorly understood. Here, we introduce a differential geometry framework, functionally invariant paths (FIP), that prov...
The highly non-linear nature of deep neural networks causes them to be susceptible to adversarial ex...
In the recent years, artificial intelligence and machine learning have witnessed a radical transform...
International audienceDeep neural networks of sizes commonly encountered in practice are proven to c...
Machine learning models are subject to changing circumstances, and will degrade over time. Nowadays,...
Artificial neural network learning is typically accomplished via adaptation between neurons. This pa...
Learning the gradient of neuron's activity function like the weight of links causes a new specificat...
In this work, we suggest Kernel Filtering Linear Overparameterization (KFLO), where a linear cascade...
The success of deep learning has shown impressive empirical breakthroughs, but many theoretical ques...
We study the theory of neural network (NN) from the lens of classical nonparametric regression probl...
In the recent decade, deep neural networks have solved ever more complex tasks across many fronts in...
Thesis (Ph.D.)--University of Washington, 2020In the past decade deep learning has revolutionized ma...
It is today acknowledged that neural network language models outperform backoff language models in a...
In contrast to SGD, adaptive gradient methods like Adam allow robust training of modern deep network...
Across scientific and engineering disciplines, the algorithmic pipeline forprocessing and understand...
Deep neural networks have relieved a great deal of burden on human experts in relation to feature en...
The highly non-linear nature of deep neural networks causes them to be susceptible to adversarial ex...
In the recent years, artificial intelligence and machine learning have witnessed a radical transform...
International audienceDeep neural networks of sizes commonly encountered in practice are proven to c...
Machine learning models are subject to changing circumstances, and will degrade over time. Nowadays,...
Artificial neural network learning is typically accomplished via adaptation between neurons. This pa...
Learning the gradient of neuron's activity function like the weight of links causes a new specificat...
In this work, we suggest Kernel Filtering Linear Overparameterization (KFLO), where a linear cascade...
The success of deep learning has shown impressive empirical breakthroughs, but many theoretical ques...
We study the theory of neural network (NN) from the lens of classical nonparametric regression probl...
In the recent decade, deep neural networks have solved ever more complex tasks across many fronts in...
Thesis (Ph.D.)--University of Washington, 2020In the past decade deep learning has revolutionized ma...
It is today acknowledged that neural network language models outperform backoff language models in a...
In contrast to SGD, adaptive gradient methods like Adam allow robust training of modern deep network...
Across scientific and engineering disciplines, the algorithmic pipeline forprocessing and understand...
Deep neural networks have relieved a great deal of burden on human experts in relation to feature en...
The highly non-linear nature of deep neural networks causes them to be susceptible to adversarial ex...
In the recent years, artificial intelligence and machine learning have witnessed a radical transform...
International audienceDeep neural networks of sizes commonly encountered in practice are proven to c...