Formal Algorithms for Transformers

Phuong, Mary
Hutter, Marcus

Publication date

July 2022

Language

English

Abstract

This document aims to be a self-contained, mathematically precise overview of transformer architectures and algorithms (*not* results). It covers what transformers are, how they are trained, what they are used for, their key architectural components, and a preview of the most prominent models. The reader is assumed to be familiar with basic ML terminology and simpler neural network architectures such as MLPs.Comment: 16 pages, 15 algorithm

Extracted data

We use cookies to provide a better user experience.

Data Protection

Formal Algorithms for Transformers

Abstract

Extracted data

Formal Algorithms for Transformers

Abstract

Extracted data

Related items

Related items