ScholarBERT: Bigger is Not Always Better | ORKG Ask

We use cookies to provide a better user experience.

Data Protection

Related items

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

Du, Nan
Huang, Yanping
Dai, Andrew M.
Tong, Simon
Lepikhin, Dmitry
Xu, Yuanzhong
Krikun, Maxim
Zhou, Yanqi
Yu, Adams Wei
Firat, Orhan
Zoph, Barret
Fedus, Liam
Bosma, Maarten
Zhou, Zongwei
Wang, Tao
Wang, Yu Emma
Webster, Kellie
Pellat, Marie
Robinson, Kevin
Meier-Hellstern, Kathleen
Duke, Toju
Dixon, Lucas
Zhang, Kun
Le, Quoc V
Wu, Yonghui
Chen, Zhifeng
Cui, Claire

August 2022

Scaling language models with more data, compute and parameters has driven significant progress in na...

A Comprehensive Exploration of Pre-training Language Models

Guo, Tong

October 2021

Recently, the development of pre-trained language models has brought natural language processing (NL...

AILAB-Udine@SMM4H 22: Limits of Transformers and BERT Ensembles

Portelli, Beatrice
Scaboro, Simone
Chersoni, Emmanuele
Santus, Enrico
Serra, Giuseppe

September 2022

This paper describes the models developed by the AILAB-Udine team for the SMM4H 22 Shared Task. We e...

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

June 2022

Language models demonstrate both quantitative improvement and new qualitative capabilities with incr...

Efficient Large Scale Language Modeling with Mixtures of Experts

Artetxe, Mikel
Bhosale, Shruti
Goyal, Naman
Mihaylov, Todor
Ott, Myle
Shleifer, Sam
Lin, Xi Victoria
Du, Jingfei
Iyer, Srinivasan
Pasunuru, Ramakanth
Anantharaman, Giri
Li, Xian
Chen, Shuohui
Akin, Halil
Baines, Mandeep
Martin, Louis
Zhou, Xing
Koura, Punit Singh
O'Horo, Brian
Wang, Jeff
Zettlemoyer, Luke
Diab, Mona
Kozareva, Zornitsa
Stoyanov, Ves

December 2021

Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional com...

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

Du, Nan
Huang, Yanping
Dai, Andrew M.
Tong, Simon
Lepikhin, Dmitry
Xu, Yuanzhong
Krikun, Maxim
Zhou, Yanqi
Yu, Adams Wei
Firat, Orhan
Zoph, Barret
Fedus, Liam
Bosma, Maarten
Zhou, Zongwei
Wang, Tao
Wang, Yu Emma
Webster, Kellie
Pellat, Marie
Robinson, Kevin
Meier-Hellstern, Kathleen
Duke, Toju
Dixon, Lucas
Zhang, Kun
Le, Quoc V
Wu, Yonghui
Chen, Zhifeng
Cui, Claire

August 2022

Scaling language models with more data, compute and parameters has driven significant progress in na...

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

June 2022

Language models demonstrate both quantitative improvement and new qualitative capabilities with incr...

Efficient Large Scale Language Modeling with Mixtures of Experts

Artetxe, Mikel
Bhosale, Shruti
Goyal, Naman
Mihaylov, Todor
Ott, Myle
Shleifer, Sam
Lin, Xi Victoria
Du, Jingfei
Iyer, Srinivasan
Pasunuru, Ramakanth
Anantharaman, Giri
Li, Xian
Chen, Shuohui
Akin, Halil
Baines, Mandeep
Martin, Louis
Zhou, Xing
Koura, Punit Singh
O'Horo, Brian
Wang, Jeff
Zettlemoyer, Luke
Diab, Mona
Kozareva, Zornitsa
Stoyanov, Ves

December 2021

Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional com...

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

Du, Nan
Huang, Yanping
Dai, Andrew M.
Tong, Simon
Lepikhin, Dmitry
Xu, Yuanzhong
Krikun, Maxim
Zhou, Yanqi
Yu, Adams Wei
Firat, Orhan
Zoph, Barret
Fedus, Liam
Bosma, Maarten
Zhou, Zongwei
Wang, Tao
Wang, Yu Emma
Webster, Kellie
Pellat, Marie
Robinson, Kevin
Meier-Hellstern, Kathleen
Duke, Toju
Dixon, Lucas
Zhang, Kun
Le, Quoc V
Wu, Yonghui
Chen, Zhifeng
Cui, Claire

August 2022

Scaling language models with more data, compute and parameters has driven significant progress in na...