SiRA: Sparse Mixture of Low Rank Adaptation

Zhu, Yun
Wichers, Nevan
Lin, Chu-Cheng
Wang, Xinyi
Chen, Tianlong
Shu, Lei
Lu, Han
Liu, Canoee
Luo, Liangchen
Chen, Jindong
Meng, Lei

Publication date

November 2023

Language

English

Abstract

Parameter Efficient Tuning has been an prominent approach to adapt the Large Language Model to downstream tasks. Most previous works considers adding the dense trainable parameters, where all parameters are used to adapt certain task. We found this less effective empirically using the example of LoRA that introducing more trainable parameters does not help. Motivated by this we investigate the importance of leveraging "sparse" computation and propose SiRA: sparse mixture of low rank adaption. SiRA leverages the Sparse Mixture of Expert(SMoE) to boost the performance of LoRA. Specifically it enforces the top $k$ experts routing with a capacity limit restricting the maximum number of tokens each expert can process. We propose a novel and simp...

Extracted data

We use cookies to provide a better user experience.

Data Protection

SiRA: Sparse Mixture of Low Rank Adaptation

Abstract

Extracted data

SiRA: Sparse Mixture of Low Rank Adaptation

Abstract

Extracted data

Related items

Related items