Towards Robust and Scalable Large Language Models | ORKG Ask

We use cookies to provide a better user experience.

Data Protection

Related items

What Language Model to Train if You Have One Million GPU Hours?

Scao, Teven Le
Wang, Thomas
Hesslow, Daniel
Saulnier, Lucile
Bekman, Stas
Bari, M Saiful
Biderman, Stella
Elsahar, Hady
Muennighoff, Niklas
Phang, Jason
Press, Ofir
Raffel, Colin
Sanh, Victor
Shen, Sheng
Sutawika, Lintang
Tae, Jaesung
Yong, Zheng Xin
Launay, Julien
Beltagy, Iz

November 2022

The crystallization of modeling methods around the Transformer architecture has been a boon for prac...

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

November 2022

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstr...

A Systematic Evaluation of Large Language Models of Code

Xu, Frank F.
Alon, Uri
Neubig, Graham
Hellendoorn, Vincent J.

May 2022

Large language models (LMs) of code have recently shown tremendous promise in completing code and sy...

Large language models in machine translation

Thorsten Brants
Ashok C. Popat
Peng Xu
Franz J. Och
Jeffrey Dean
Google Inc

January 2007

This paper reports on the benefits of largescale statistical language modeling in machine translatio...

Efficient Large Scale Language Modeling with Mixtures of Experts

Artetxe, Mikel
Bhosale, Shruti
Goyal, Naman
Mihaylov, Todor
Ott, Myle
Shleifer, Sam
Lin, Xi Victoria
Du, Jingfei
Iyer, Srinivasan
Pasunuru, Ramakanth
Anantharaman, Giri
Li, Xian
Chen, Shuohui
Akin, Halil
Baines, Mandeep
Martin, Louis
Zhou, Xing
Koura, Punit Singh
O'Horo, Brian
Wang, Jeff
Zettlemoyer, Luke
Diab, Mona
Kozareva, Zornitsa
Stoyanov, Ves

December 2021

Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional com...

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

June 2022

Language models demonstrate both quantitative improvement and new qualitative capabilities with incr...

What Language Model to Train if You Have One Million GPU Hours?

Scao, Teven Le
Wang, Thomas
Hesslow, Daniel
Saulnier, Lucile
Bekman, Stas
Bari, M Saiful
Biderman, Stella
Elsahar, Hady
Muennighoff, Niklas
Phang, Jason
Press, Ofir
Raffel, Colin
Sanh, Victor
Shen, Sheng
Sutawika, Lintang
Tae, Jaesung
Yong, Zheng Xin
Launay, Julien
Beltagy, Iz

November 2022

The crystallization of modeling methods around the Transformer architecture has been a boon for prac...

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

November 2022

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstr...

Efficient Large Scale Language Modeling with Mixtures of Experts

Artetxe, Mikel
Bhosale, Shruti
Goyal, Naman
Mihaylov, Todor
Ott, Myle
Shleifer, Sam
Lin, Xi Victoria
Du, Jingfei
Iyer, Srinivasan
Pasunuru, Ramakanth
Anantharaman, Giri
Li, Xian
Chen, Shuohui
Akin, Halil
Baines, Mandeep
Martin, Louis
Zhou, Xing
Koura, Punit Singh
O'Horo, Brian
Wang, Jeff
Zettlemoyer, Luke
Diab, Mona
Kozareva, Zornitsa
Stoyanov, Ves

December 2021

Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional com...

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

June 2022

Language models demonstrate both quantitative improvement and new qualitative capabilities with incr...

What Language Model to Train if You Have One Million GPU Hours?

Scao, Teven Le
Wang, Thomas
Hesslow, Daniel
Saulnier, Lucile
Bekman, Stas
Bari, M Saiful
Biderman, Stella
Elsahar, Hady
Muennighoff, Niklas
Phang, Jason
Press, Ofir
Raffel, Colin
Sanh, Victor
Shen, Sheng
Sutawika, Lintang
Tae, Jaesung
Yong, Zheng Xin
Launay, Julien
Beltagy, Iz

November 2022

The crystallization of modeling methods around the Transformer architecture has been a boon for prac...

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

November 2022

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstr...