We propose a framework to modularize the training of neural language models that use diverse forms of sentence-external context (including metadata) by eliminating the need to jointly train sentence-external and within-sentence encoders. Our approach, contextual universal embeddings (CUE), trains LMs on one set of context, such as date and author, and adapts to novel metadata types, such as article title, or previous sentence. The model consists of a pretrained neural sentence LM, a BERT-based context encoder, and a masked transformer decoder that estimates LM probabilities using sentence-internal and sentence-external information. When context or metadata are unavailable, our model learns to combine contextual and sentence-internal informa...
A salient feature of Neural Machine Translation (NMT) is the end-to-end nature of training employed,...
In-context learning is a recent paradigm in natural language understanding, where a large pre-traine...
Causal language modeling (LM) uses word history to predict the next word. BERT, on the other hand, m...
An end-to-end (E2E) ASR model implicitly learns a prior Internal Language Model (ILM) from the train...
Language models (LMs) have been used in cognitive modeling as well as engineering studies -- they co...
A variety of contextualised language models have been proposed in the NLP community, which are train...
Vector representation of sentences is important for many text processing tasks that involve classify...
When pre-trained on large unsupervised textual corpora, language models are able to store and retri...
Self-supervised large language models (LMs) have become a highly-influential and foundational tool f...
State-of-the-art encoder-decoder models (e.g. for machine translation (MT) or speech recognition (AS...
Language models significantly benefit from context tokens, such as prompts or scratchpads. They perf...
Multi-encoder models are a broad family of context-aware Neural Machine Translation (NMT) systems th...
Recent research has made impressive progress in large-scale multimodal pre-training. In the context ...
In this paper we present a comparison between the linguistic knowledge encoded in the internal repre...
Though achieving impressive results on many NLP tasks, the BERT-like masked language models (MLM) en...
A salient feature of Neural Machine Translation (NMT) is the end-to-end nature of training employed,...
In-context learning is a recent paradigm in natural language understanding, where a large pre-traine...
Causal language modeling (LM) uses word history to predict the next word. BERT, on the other hand, m...
An end-to-end (E2E) ASR model implicitly learns a prior Internal Language Model (ILM) from the train...
Language models (LMs) have been used in cognitive modeling as well as engineering studies -- they co...
A variety of contextualised language models have been proposed in the NLP community, which are train...
Vector representation of sentences is important for many text processing tasks that involve classify...
When pre-trained on large unsupervised textual corpora, language models are able to store and retri...
Self-supervised large language models (LMs) have become a highly-influential and foundational tool f...
State-of-the-art encoder-decoder models (e.g. for machine translation (MT) or speech recognition (AS...
Language models significantly benefit from context tokens, such as prompts or scratchpads. They perf...
Multi-encoder models are a broad family of context-aware Neural Machine Translation (NMT) systems th...
Recent research has made impressive progress in large-scale multimodal pre-training. In the context ...
In this paper we present a comparison between the linguistic knowledge encoded in the internal repre...
Though achieving impressive results on many NLP tasks, the BERT-like masked language models (MLM) en...
A salient feature of Neural Machine Translation (NMT) is the end-to-end nature of training employed,...
In-context learning is a recent paradigm in natural language understanding, where a large pre-traine...
Causal language modeling (LM) uses word history to predict the next word. BERT, on the other hand, m...