Modern language models mostly take sub-words as input, a design that balances the trade-off between vocabulary size, number of parameters, and performance. However, sub-word tokenization still has disadvantages like not being robust to noise and difficult to generalize to new languages. Also, the current trend of scaling up models reveals that larger models require larger embeddings but that makes parallelization hard. Previous work on image classification proves splitting raw input into a sequence of chucks is a strong, model-agnostic inductive bias. Based on this observation, we rethink the existing character-aware method that takes character-level inputs but makes word-level sequence modeling and prediction. We overhaul this method by in...
Verwimp L., Pelemans J., Van hamme H., Wambacq P., ''Character-word LSTM language models'', Proceedi...
�� 2018 The Authors. Published by Association for Computational Linguistics. This is an open access ...
Character-level language models obviate the need for separately trained tokenizers, but efficiency s...
Language models typically tokenize text into subwords, using a deterministic, hand-engineered heuris...
Language Modeling (LM) is a complex task that has been preferably addressed with word level RNNs and...
Neural architectures are prominent in the construction of language models (LMs). However, word-leve...
15 page preprintWhat are the units of text that we want to model? From bytes to multi-word expressio...
Language models are defined over a finite set of inputs, which creates a vocabulary bottleneck when ...
The Backpack is a Transformer alternative shown to improve interpretability in English language mode...
Thesis (Ph.D.)--University of Washington, 2023Language models (LMs) are at the core of almost all st...
In Neural Machine Translation, using word-level tokens leads to degradation in translation quality. ...
Subword units are commonly used for end-to-end automatic speech recognition (ASR), while a fully aco...
Recently, the development of pre-trained language models has brought natural language processing (NL...
We look at a decision taken early in training a subword tokenizer, namely whether it should be the w...
Almost all existing machine translation models are built on top of character-based vocabularies: cha...
Verwimp L., Pelemans J., Van hamme H., Wambacq P., ''Character-word LSTM language models'', Proceedi...
�� 2018 The Authors. Published by Association for Computational Linguistics. This is an open access ...
Character-level language models obviate the need for separately trained tokenizers, but efficiency s...
Language models typically tokenize text into subwords, using a deterministic, hand-engineered heuris...
Language Modeling (LM) is a complex task that has been preferably addressed with word level RNNs and...
Neural architectures are prominent in the construction of language models (LMs). However, word-leve...
15 page preprintWhat are the units of text that we want to model? From bytes to multi-word expressio...
Language models are defined over a finite set of inputs, which creates a vocabulary bottleneck when ...
The Backpack is a Transformer alternative shown to improve interpretability in English language mode...
Thesis (Ph.D.)--University of Washington, 2023Language models (LMs) are at the core of almost all st...
In Neural Machine Translation, using word-level tokens leads to degradation in translation quality. ...
Subword units are commonly used for end-to-end automatic speech recognition (ASR), while a fully aco...
Recently, the development of pre-trained language models has brought natural language processing (NL...
We look at a decision taken early in training a subword tokenizer, namely whether it should be the w...
Almost all existing machine translation models are built on top of character-based vocabularies: cha...
Verwimp L., Pelemans J., Van hamme H., Wambacq P., ''Character-word LSTM language models'', Proceedi...
�� 2018 The Authors. Published by Association for Computational Linguistics. This is an open access ...
Character-level language models obviate the need for separately trained tokenizers, but efficiency s...