On Biasing Transformer Attention Towards Monotonicity

Rios, Annette
Amrhein, Chantal
Aepli, Noëmi
Sennrich, Rico

Open PDF

Open link

Publication date

June 2021

DOI

10.5167/uzh-203433

Publisher

Association for Computational Linguistics

Language

English

Abstract

Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignment between source and target sequence, and previous work has facilitated or enforced learning of monotonic attention behavior via specialized attention functions or pretraining. In this work, we introduce a monotonicity loss function that is compatible with standard attention mechanisms and test it on several sequence-to-sequence tasks: grapheme-to-phoneme conversion, morphological inflection, transliteration, and dialect normalization. Experiments show that we can achieve largely monotonic behavior. Performance is mixed, with larger gains on top of RNN baselines. General monotonicity does not benefit transformer multihead attention, however, ...

Extracted data

We use cookies to provide a better user experience.

Data Protection

On Biasing Transformer Attention Towards Monotonicity

Abstract

Extracted data

On Biasing Transformer Attention Towards Monotonicity

Abstract

Extracted data

Related items

Related items