The advent of large-scale pre-trained language models has contributed greatly to the recent progress in natural language processing. Many state-of-the-art language models are first trained on a large text corpus and then fine-tuned on downstream tasks. Despite its recent success and wide adoption, fine-tuning a pre-trained language model often suffers from overfitting, which leads to poor generalizability due to the extremely high complexity of the model and the limited training samples from downstream tasks. To address this problem, we propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR). Specifically, we propose to inject the standard Gaussian noise or In-manifold noise and regularize ...
Injecting noise within gradient descent has several desirable features. In this paper, we explore no...
In this paper, we investigate the usage of large language models (LLMs) to improve the performance o...
To mitigate the problem of having to traverse over the full vocabulary in the softmax normalization ...
Adopting a two-stage paradigm of pretraining followed by fine-tuning, Pretrained Language Models (PL...
The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their gen...
While Automatic Speech Recognition (ASR) models have shown significant advances with the introductio...
Language model fine-tuning is essential for modern natural language processing, but is computational...
High-quality instruction-tuning data is critical to improving LLM capabilities. Existing data collec...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
Utilizing text-only data with an external language model (ELM) in end-to-end RNN-Transducer (RNN-T) ...
In recent research, in the domain of speech processing, large End-to-End (E2E) systems for Automatic...
Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications ...
In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recogniti...
Data augmentation is a widely used technique in machine learning to improve model performance. Howev...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Injecting noise within gradient descent has several desirable features. In this paper, we explore no...
In this paper, we investigate the usage of large language models (LLMs) to improve the performance o...
To mitigate the problem of having to traverse over the full vocabulary in the softmax normalization ...
Adopting a two-stage paradigm of pretraining followed by fine-tuning, Pretrained Language Models (PL...
The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their gen...
While Automatic Speech Recognition (ASR) models have shown significant advances with the introductio...
Language model fine-tuning is essential for modern natural language processing, but is computational...
High-quality instruction-tuning data is critical to improving LLM capabilities. Existing data collec...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
Utilizing text-only data with an external language model (ELM) in end-to-end RNN-Transducer (RNN-T) ...
In recent research, in the domain of speech processing, large End-to-End (E2E) systems for Automatic...
Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications ...
In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recogniti...
Data augmentation is a widely used technique in machine learning to improve model performance. Howev...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Injecting noise within gradient descent has several desirable features. In this paper, we explore no...
In this paper, we investigate the usage of large language models (LLMs) to improve the performance o...
To mitigate the problem of having to traverse over the full vocabulary in the softmax normalization ...