Pre-trained contextual representations have led to dramatic performance improvements on a range of downstream tasks. Such performance improvements have motivated researchers to quantify and understand the linguistic information encoded in these representations. In general, researchers quantify the amount of linguistic information through probing, an endeavor which consists of training a supervised model to predict a linguistic property directly from the contextual representations. Unfortunately, this definition of probing has been subject to extensive criticism in the literature, and has been observed to lead to paradoxical and counterintuitive results. In the theoretical portion of this paper, we take the position that the goal of probing ...
A major problem in machine learning is that of inductive bias: how to choose a learner’s hy-pothesis...
The outstanding performance recently reached by Neural Language Models (NLMs) across many Natural La...
The uniform information density (UID) hypothesis, which posits that speakers behaving optimally tend...
The success of pre-trained contextualized representations has prompted researchers to analyze them f...
Previous work on probing word representations for linguistic knowledge has focused on interpolation ...
Classifiers trained on auxiliary probing tasks are a popular tool to analyze the representations lea...
Probing is a popular method to discern what linguistic information is contained in the representatio...
Over the past decades natural language processing has evolved from a niche research area into a fast...
While most animals have communication systems, few exhibit such high-level of complexityas human lan...
Exploring inductive biases 2 Bayesian models as tools for exploring inductive biases Generalization ...
Recurrent neural networks are capable of learning context-free tasks, however learning performance i...
In this thesis, we try to build a connection between the two schools by introducing syntactic induct...
Since language models are used to model a wide variety of languages, it is natural to ask whether th...
Through in-context learning (ICL), large-scale language models are effective few-shot learners witho...
The question of how to probe contextual word representations in a way that is principled and useful ...
A major problem in machine learning is that of inductive bias: how to choose a learner’s hy-pothesis...
The outstanding performance recently reached by Neural Language Models (NLMs) across many Natural La...
The uniform information density (UID) hypothesis, which posits that speakers behaving optimally tend...
The success of pre-trained contextualized representations has prompted researchers to analyze them f...
Previous work on probing word representations for linguistic knowledge has focused on interpolation ...
Classifiers trained on auxiliary probing tasks are a popular tool to analyze the representations lea...
Probing is a popular method to discern what linguistic information is contained in the representatio...
Over the past decades natural language processing has evolved from a niche research area into a fast...
While most animals have communication systems, few exhibit such high-level of complexityas human lan...
Exploring inductive biases 2 Bayesian models as tools for exploring inductive biases Generalization ...
Recurrent neural networks are capable of learning context-free tasks, however learning performance i...
In this thesis, we try to build a connection between the two schools by introducing syntactic induct...
Since language models are used to model a wide variety of languages, it is natural to ask whether th...
Through in-context learning (ICL), large-scale language models are effective few-shot learners witho...
The question of how to probe contextual word representations in a way that is principled and useful ...
A major problem in machine learning is that of inductive bias: how to choose a learner’s hy-pothesis...
The outstanding performance recently reached by Neural Language Models (NLMs) across many Natural La...
The uniform information density (UID) hypothesis, which posits that speakers behaving optimally tend...