Imputing out-of-vocabulary embeddings with LOVE makes language models robust with little cost

Chen, Lihu
Varoquaux, Gaël
Suchanek, Fabian

Publication date

May 2022

Publisher

HAL CCSD

Abstract

International audienceState-of-the-art NLP systems represent inputs with word embeddings, but these are brittle when faced with Out-of-Vocabulary (OOV) words. To address this issue, we follow the principle of mimick-like models to generate vectors for unseen words, by learning the behavior of pre-trained embeddings using only the surface form of words. We present a simple contrastive learning framework, LOVE, which extends the word representation of an existing pre-trained language model (such as BERT), and makes it robust to OOV with few additional parameters. Extensive evaluations demonstrate that our lightweight model achieves similar or even better performances than prior competitors, both on original datasets and on corrupted variants....

Extracted data

We use cookies to provide a better user experience.

Data Protection

Imputing out-of-vocabulary embeddings with LOVE makes language models robust with little cost

Abstract

Extracted data

Imputing out-of-vocabulary embeddings with LOVE makes language models robust with little cost

Abstract

Extracted data

Related items

Related items