Current pre-trained language models (PLM) are typically trained with static data, ignoring that in real-world scenarios, streaming data of various sources may continuously grow. This requires PLMs to integrate the information from all the sources in a lifelong manner. Although this goal could be achieved by exhaustive pre-training on all the existing data, such a process is known to be computationally expensive. To this end, we propose ELLE, aiming at efficient lifelong pre-training for emerging data. Specifically, ELLE consists of (1) function preserved model expansion, which flexibly expands an existing PLM's width and depth to improve the efficiency of knowledge acquisition; and (2) pre-trained domain prompts, which disentangle the versa...
Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long infe...
Machine learning (ML) pipelines for model training and validation typically include preprocessing, s...
Existing pre-trained models are generally geared towards a particular class of problems. To date, th...
Pretrained language models (PTLMs) are typically learned over a large, static corpus and further fin...
Transfer learning is to apply knowledge or patterns learned in a particular field or task to differe...
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to...
The task of data-to-text generation amounts to describing structured data, such as RDF triples, in f...
Pretrained language models (PLMs) have demonstrated remarkable performance in various natural langua...
Large language models (LLMs) have demonstrated remarkable open-domain capabilities. Traditionally, L...
The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their gen...
Natural language processing (NLP) techniques had significantly improved by introducing pre-trained l...
Recent advances in NLP are brought by a range of large-scale pretrained language models (PLMs). Thes...
Thesis (Ph.D.)--University of Washington, 2022A robust language processing machine should be able to...
Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long infe...
Since the first bidirectional deep learn- ing model for natural language understanding, BERT, emerge...
Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long infe...
Machine learning (ML) pipelines for model training and validation typically include preprocessing, s...
Existing pre-trained models are generally geared towards a particular class of problems. To date, th...
Pretrained language models (PTLMs) are typically learned over a large, static corpus and further fin...
Transfer learning is to apply knowledge or patterns learned in a particular field or task to differe...
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to...
The task of data-to-text generation amounts to describing structured data, such as RDF triples, in f...
Pretrained language models (PLMs) have demonstrated remarkable performance in various natural langua...
Large language models (LLMs) have demonstrated remarkable open-domain capabilities. Traditionally, L...
The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their gen...
Natural language processing (NLP) techniques had significantly improved by introducing pre-trained l...
Recent advances in NLP are brought by a range of large-scale pretrained language models (PLMs). Thes...
Thesis (Ph.D.)--University of Washington, 2022A robust language processing machine should be able to...
Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long infe...
Since the first bidirectional deep learn- ing model for natural language understanding, BERT, emerge...
Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long infe...
Machine learning (ML) pipelines for model training and validation typically include preprocessing, s...
Existing pre-trained models are generally geared towards a particular class of problems. To date, th...