Representational spaces learned via language modeling are fundamental to Natural Language Processing (NLP), however there has been limited understanding regarding how and when during training various types of linguistic information emerge and interact. Leveraging a novel information theoretic probing suite, which enables direct comparisons of not just task performance, but their representational subspaces, we analyze nine tasks covering syntax, semantics and reasoning, across 2M pre-training steps and five seeds. We identify critical learning phases across tasks and time, during which subspaces emerge, share information, and later disentangle to specialize. Across these phases, syntactic knowledge is acquired rapidly after 0.5% of full trai...
The success of large pretrained language models (LMs) such as BERT and RoBERTa has sparked interest ...
Over-paramaterized neural models have become dominant in Natural Language Processing. Increasing the...
Thesis (Ph.D.)--University of Washington, 2022A robust language processing machine should be able to...
Neural language models have drastically changed the landscape of natural language processing (NLP). ...
How to properly represent language is a crucial and fundamental problem in Natural Language Processi...
Many tasks are considered to be 'solved' in the computational linguistics literature, but the corres...
Recent studies (Blevins et al. 2018, Tenney et al. 2019, etc) have presented evidence that linguisti...
Do state-of-the-art models for language understanding already have, or can they easily learn, abilit...
There is an ongoing debate in the NLP community whether modern language models contain linguistic k...
State-of-the-art NLP systems are generally based on the assumption that the underlying models are pr...
Progress in pre-trained language models has led to a surge of impressive results on downstream tasks...
Why do artificial neural networks model language so well? We claim that in order to answer this ques...
Being able to learn word meanings across multiple scenes consisting of multiple words and referents ...
Thesis (Master's)--University of Washington, 2020Understanding language depending on the context of ...
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to...
The success of large pretrained language models (LMs) such as BERT and RoBERTa has sparked interest ...
Over-paramaterized neural models have become dominant in Natural Language Processing. Increasing the...
Thesis (Ph.D.)--University of Washington, 2022A robust language processing machine should be able to...
Neural language models have drastically changed the landscape of natural language processing (NLP). ...
How to properly represent language is a crucial and fundamental problem in Natural Language Processi...
Many tasks are considered to be 'solved' in the computational linguistics literature, but the corres...
Recent studies (Blevins et al. 2018, Tenney et al. 2019, etc) have presented evidence that linguisti...
Do state-of-the-art models for language understanding already have, or can they easily learn, abilit...
There is an ongoing debate in the NLP community whether modern language models contain linguistic k...
State-of-the-art NLP systems are generally based on the assumption that the underlying models are pr...
Progress in pre-trained language models has led to a surge of impressive results on downstream tasks...
Why do artificial neural networks model language so well? We claim that in order to answer this ques...
Being able to learn word meanings across multiple scenes consisting of multiple words and referents ...
Thesis (Master's)--University of Washington, 2020Understanding language depending on the context of ...
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to...
The success of large pretrained language models (LMs) such as BERT and RoBERTa has sparked interest ...
Over-paramaterized neural models have become dominant in Natural Language Processing. Increasing the...
Thesis (Ph.D.)--University of Washington, 2022A robust language processing machine should be able to...