Efficient Large Scale Language Modeling with Mixtures of Experts | ORKG Ask

We use cookies to provide a better user experience.

Data Protection

Related items

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

Rajbhandari, Samyam
Li, Conglong
Yao, Zhewei
Zhang, Minjia
Aminabadi, Reza Yazdani
Awan, Ammar Ahmad
Rasley, Jeff
He, Yuxiong

July 2022

As the training of giant dense models hits the boundary on the availability and capability of the ha...

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

Du, Nan
Huang, Yanping
Dai, Andrew M.
Tong, Simon
Lepikhin, Dmitry
Xu, Yuanzhong
Krikun, Maxim
Zhou, Yanqi
Yu, Adams Wei
Firat, Orhan
Zoph, Barret
Fedus, Liam
Bosma, Maarten
Zhou, Zongwei
Wang, Tao
Wang, Yu Emma
Webster, Kellie
Pellat, Marie
Robinson, Kevin
Meier-Hellstern, Kathleen
Duke, Toju
Dixon, Lucas
Zhang, Kun
Le, Quoc V
Wu, Yonghui
Chen, Zhifeng
Cui, Claire

August 2022

Scaling language models with more data, compute and parameters has driven significant progress in na...

Large language models in machine translation

Thorsten Brants
Ashok C. Popat
Peng Xu
Franz J. Och
Jeffrey Dean
Google Inc

January 2007

This paper reports on the benefits of largescale statistical language modeling in machine translatio...

Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models

Shen, Sheng
Hou, Le
Zhou, Yanqi
Du, Nan
Longpre, Shayne
Wei, Jason
Chung, Hyung Won
Zoph, Barret
Fedus, William
Chen, Xinyun
Vu, Tu
Wu, Yuexin
Chen, Wuyang
Webson, Albert
Li, Yunxuan
Zhao, Vincent
Yu, Hongkun
Keutzer, Kurt
Darrell, Trevor
Zhou, Denny

July 2023

Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnabl...

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

June 2022

Language models demonstrate both quantitative improvement and new qualitative capabilities with incr...

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

Du, Nan
Huang, Yanping
Dai, Andrew M.
Tong, Simon
Lepikhin, Dmitry
Xu, Yuanzhong
Krikun, Maxim
Zhou, Yanqi
Yu, Adams Wei
Firat, Orhan
Zoph, Barret
Fedus, Liam
Bosma, Maarten
Zhou, Zongwei
Wang, Tao
Wang, Yu Emma
Webster, Kellie
Pellat, Marie
Robinson, Kevin
Meier-Hellstern, Kathleen
Duke, Toju
Dixon, Lucas
Zhang, Kun
Le, Quoc V
Wu, Yonghui
Chen, Zhifeng
Cui, Claire

August 2022

Scaling language models with more data, compute and parameters has driven significant progress in na...

Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models

Shen, Sheng
Hou, Le
Zhou, Yanqi
Du, Nan
Longpre, Shayne
Wei, Jason
Chung, Hyung Won
Zoph, Barret
Fedus, William
Chen, Xinyun
Vu, Tu
Wu, Yuexin
Chen, Wuyang
Webson, Albert
Li, Yunxuan
Zhao, Vincent
Yu, Hongkun
Keutzer, Kurt
Darrell, Trevor
Zhou, Denny

July 2023

Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnabl...

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

June 2022

Language models demonstrate both quantitative improvement and new qualitative capabilities with incr...