On Separate Normalization in Self-supervised Transformers

Chen, Xiaohui
Wang, Yinkai
Du, Yuanqi
Hassoun, Soha
Liu, Li-Ping

Publication date

September 2023

Language

English

Abstract

Self-supervised training methods for transformers have demonstrated remarkable performance across various domains. Previous transformer-based models, such as masked autoencoders (MAE), typically utilize a single normalization layer for both the [CLS] symbol and the tokens. We propose in this paper a simple modification that employs separate normalization layers for the tokens and the [CLS] symbol to better capture their distinct characteristics and enhance downstream task performance. Our method aims to alleviate the potential negative effects of using the same normalization statistics for both token types, which may not be optimally aligned with their individual roles. We empirically show that by utilizing a separate normalization layer, t...

Extracted data

We use cookies to provide a better user experience.

Data Protection

On Separate Normalization in Self-supervised Transformers

Abstract

Extracted data

On Separate Normalization in Self-supervised Transformers

Abstract

Extracted data

Related items

Related items