The core of self-supervised learning for pre-training language models includes pre-training task design as well as appropriate data augmentation. Most data augmentations in language model pre-training are context-independent. A seminal contextualized augmentation was recently proposed in ELECTRA and achieved state-of-the-art performance by introducing an auxiliary generation network (generator) to produce contextualized data augmentation for the training of a main discrimination network (discriminator). This design, however, introduces extra computation cost of the generator and a need to adjust the relative capability between the generator and the discriminator. In this paper, we propose a self-augmentation strategy (SAS) where a single ne...
In recent years, the development of accurate deep keyword spotting (KWS) models has resulted in KWS ...
Since the first bidirectional deep learn- ing model for natural language understanding, BERT, emerge...
With appropriate pre-training on unstructured text, larger and more accurate neural network models c...
The core of self-supervised learning for pre-training language models includes pre-training task des...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Language Models (LMs) pre-trained with self-supervision on large text corpora have become the defaul...
The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their gen...
Thesis (Ph.D.)--University of Washington, 2022A robust language processing machine should be able to...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
Pre-training a language model and then fine-tuning it for downstream tasks has demonstrated state-of...
Self-supervised pre-training could effectively improve the performance of low-resource automatic spe...
Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural l...
Pre-trained masked language models successfully perform few-shot learning by formulating downstream ...
Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different doma...
Self-supervised speech pre-training empowers the model with the contextual structure inherent in the...
In recent years, the development of accurate deep keyword spotting (KWS) models has resulted in KWS ...
Since the first bidirectional deep learn- ing model for natural language understanding, BERT, emerge...
With appropriate pre-training on unstructured text, larger and more accurate neural network models c...
The core of self-supervised learning for pre-training language models includes pre-training task des...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Language Models (LMs) pre-trained with self-supervision on large text corpora have become the defaul...
The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their gen...
Thesis (Ph.D.)--University of Washington, 2022A robust language processing machine should be able to...
Self-supervised representation learning (SSRL) has improved the performance on downstream phoneme re...
Pre-training a language model and then fine-tuning it for downstream tasks has demonstrated state-of...
Self-supervised pre-training could effectively improve the performance of low-resource automatic spe...
Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural l...
Pre-trained masked language models successfully perform few-shot learning by formulating downstream ...
Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different doma...
Self-supervised speech pre-training empowers the model with the contextual structure inherent in the...
In recent years, the development of accurate deep keyword spotting (KWS) models has resulted in KWS ...
Since the first bidirectional deep learn- ing model for natural language understanding, BERT, emerge...
With appropriate pre-training on unstructured text, larger and more accurate neural network models c...