Residual Mixture of Experts

Wu, Lemeng
Liu, Mengchen
Chen, Yinpeng
Chen, Dongdong
Dai, Xiyang
Yuan, Lu

Publication date

October 2022

Language

English

Abstract

Mixture of Experts (MoE) is able to scale up vision transformers effectively. However, it requires prohibiting computation resources to train a large MoE transformer. In this paper, we propose Residual Mixture of Experts (RMoE), an efficient training pipeline for MoE vision transformers on downstream tasks, such as segmentation and detection. RMoE achieves comparable results with the upper-bound MoE training, while only introducing minor additional training cost than the lower-bound non-MoE training pipelines. The efficiency is supported by our key observation: the weights of an MoE transformer can be factored into an input-independent core and an input-dependent residual. Compared with the weight core, the weight residual can be efficientl...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Residual Mixture of Experts

Abstract

Extracted data

Residual Mixture of Experts

Abstract

Extracted data

Related items

Related items