LaMBERT: Light and Multigranular BERT

Milosheski, Ljupche

Publication date

July 2020

Abstract

Pre-training complex language models is essential for the success of the recent methods such as BERT or OpenAI GPT. Their size makes not only the pre-training phase, but also consecutive applications to be computationally expensive. BERT-like models excel at token-level tasks as they provide reliable token embeddings, but they fall short when it comes to sentence or higher-level structure embeddings. The reason is that these models do not have a built-in mechanism that explicitly provides such representations. We introduce Light and Multigranural BERT that has similar complexity to BERT in the number of parameters, but is about 3 times faster by modifying the input representation, which consequently introduces changes to the attention mecha...

Extracted data

We use cookies to provide a better user experience.

Data Protection

LaMBERT: Light and Multigranular BERT

Abstract

Extracted data

LaMBERT: Light and Multigranular BERT

Abstract

Extracted data

Related items

Related items