Word-Level Representation From Bytes For Language Modeling

Lee, Chu-Tak
Guo, Qipeng
Qiu, Xipeng

Publication date

November 2022

Language

English

Abstract

Modern language models mostly take sub-words as input, a design that balances the trade-off between vocabulary size, number of parameters, and performance. However, sub-word tokenization still has disadvantages like not being robust to noise and difficult to generalize to new languages. Also, the current trend of scaling up models reveals that larger models require larger embeddings but that makes parallelization hard. Previous work on image classification proves splitting raw input into a sequence of chucks is a strong, model-agnostic inductive bias. Based on this observation, we rethink the existing character-aware method that takes character-level inputs but makes word-level sequence modeling and prediction. We overhaul this method by in...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Word-Level Representation From Bytes For Language Modeling

Abstract

Extracted data

Word-Level Representation From Bytes For Language Modeling

Abstract

Extracted data

Related items

Related items