LegoNN: Building Modular Encoder-Decoder Models

Dalmia, Siddharth
Okhonko, Dmytro
Lewis, Mike
Edunov, Sergey
Watanabe, Shinji
Metze, Florian
Zettlemoyer, Luke
Mohamed, Abdelrahman

Publication date

June 2022

Abstract

State-of-the-art encoder-decoder models (e.g. for machine translation (MT) or speech recognition (ASR)) are constructed and trained end-to-end as an atomic unit. No component of the model can be (re-)used without the others. We describe LegoNN, a procedure for building encoder-decoder architectures with decoder modules that can be reused across various MT and ASR tasks, without the need for any fine-tuning. To achieve reusability, the interface between each encoder and decoder modules is grounded to a sequence of marginal distributions over a discrete vocabulary pre-defined by the model designer. We present two approaches for ingesting these marginals; one is differentiable, allowing the flow of gradients across the entire network, and the ...

Extracted data

We use cookies to provide a better user experience.

Data Protection

LegoNN: Building Modular Encoder-Decoder Models

Abstract

Extracted data

LegoNN: Building Modular Encoder-Decoder Models

Abstract

Extracted data

Related items

Related items