Analyzing Bagging Methods for Language Models | ORKG Ask

We use cookies to provide a better user experience.

Data Protection

Related items

It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners

Schick, Timo
Schütze, Hinrich
Toutanova, Kristina

June 2021

When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown e...

Efficient Large Scale Language Modeling with Mixtures of Experts

Artetxe, Mikel
Bhosale, Shruti
Goyal, Naman
Mihaylov, Todor
Ott, Myle
Shleifer, Sam
Lin, Xi Victoria
Du, Jingfei
Iyer, Srinivasan
Pasunuru, Ramakanth
Anantharaman, Giri
Li, Xian
Chen, Shuohui
Akin, Halil
Baines, Mandeep
Martin, Louis
Zhou, Xing
Koura, Punit Singh
O'Horo, Brian
Wang, Jeff
Zettlemoyer, Luke
Diab, Mona
Kozareva, Zornitsa
Stoyanov, Ves

December 2021

Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional com...

An Empirical Study of Factors Affecting Language-Independent Models

Liu, Xiaotong
Xu, Anbang
Akkiraju, Rama

January 2022

Scaling existing applications and solutions to multiple human languages has traditionally proven to ...

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

Du, Nan
Huang, Yanping
Dai, Andrew M.
Tong, Simon
Lepikhin, Dmitry
Xu, Yuanzhong
Krikun, Maxim
Zhou, Yanqi
Yu, Adams Wei
Firat, Orhan
Zoph, Barret
Fedus, Liam
Bosma, Maarten
Zhou, Zongwei
Wang, Tao
Wang, Yu Emma
Webster, Kellie
Pellat, Marie
Robinson, Kevin
Meier-Hellstern, Kathleen
Duke, Toju
Dixon, Lucas
Zhang, Kun
Le, Quoc V
Wu, Yonghui
Chen, Zhifeng
Cui, Claire

August 2022

Scaling language models with more data, compute and parameters has driven significant progress in na...

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

June 2022

Language models demonstrate both quantitative improvement and new qualitative capabilities with incr...

Efficient Large Scale Language Modeling with Mixtures of Experts

Artetxe, Mikel
Bhosale, Shruti
Goyal, Naman
Mihaylov, Todor
Ott, Myle
Shleifer, Sam
Lin, Xi Victoria
Du, Jingfei
Iyer, Srinivasan
Pasunuru, Ramakanth
Anantharaman, Giri
Li, Xian
Chen, Shuohui
Akin, Halil
Baines, Mandeep
Martin, Louis
Zhou, Xing
Koura, Punit Singh
O'Horo, Brian
Wang, Jeff
Zettlemoyer, Luke
Diab, Mona
Kozareva, Zornitsa
Stoyanov, Ves

December 2021

Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional com...

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

Du, Nan
Huang, Yanping
Dai, Andrew M.
Tong, Simon
Lepikhin, Dmitry
Xu, Yuanzhong
Krikun, Maxim
Zhou, Yanqi
Yu, Adams Wei
Firat, Orhan
Zoph, Barret
Fedus, Liam
Bosma, Maarten
Zhou, Zongwei
Wang, Tao
Wang, Yu Emma
Webster, Kellie
Pellat, Marie
Robinson, Kevin
Meier-Hellstern, Kathleen
Duke, Toju
Dixon, Lucas
Zhang, Kun
Le, Quoc V
Wu, Yonghui
Chen, Zhifeng
Cui, Claire

August 2022

Scaling language models with more data, compute and parameters has driven significant progress in na...

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

June 2022

Language models demonstrate both quantitative improvement and new qualitative capabilities with incr...