Large-scale, pre-trained language models (LMs) have achieved human-level performance on a breadth of language understanding tasks. However, evaluations only based on end task performance shed little light on machines' true ability in language understanding and reasoning. In this paper, we highlight the importance of evaluating the underlying reasoning process in addition to end performance. Toward this goal, we introduce Tiered Reasoning for Intuitive Physics (TRIP), a novel commonsense reasoning dataset with dense annotations that enable multi-tiered evaluation of machines' reasoning process. Our empirical results show that while large LMs can achieve high end performance, they struggle to support their predictions with valid supporting ev...
This thesis develops formal computational models of intuitive theories, in particular intuitive phys...
Abstract reasoning is a key ability for an intelligent system. Large language models achieve above-c...
Large language models (LLMs) have shown remarkable reasoning capabilities given chain-of-thought pro...
Language models have achieved remarkable performance on a wide range of tasks that require natural l...
Logical reasoning consistently plays a fundamental and significant role in the domains of knowledge ...
Thesis (Ph.D.)--University of Washington, 2020For machines to understand language, they must intuiti...
Recent years have brought about a renewed interest in commonsense representation and reasoning in th...
Conceptualization, or viewing entities and situations as instances of abstract concepts in mind and ...
Natural Language Inference (NLI) is considered a representative task to test natural language unders...
The development of highly fluent large language models (LLMs) has prompted increased interest in ass...
Humans understand language by extracting information (meaning) from sentences, combining it with exi...
Large Language Models (LLMs) have demonstrated remarkable performance on various quantitative reason...
Contextualized representations trained over large raw text data have given remarkable improvements f...
Abstract Large language models (LLMs) such as GPT-4 have recently demonstrated impressive results ac...
Human beings are known for their remarkable ability to comprehend, analyze, and interpret commonsens...
This thesis develops formal computational models of intuitive theories, in particular intuitive phys...
Abstract reasoning is a key ability for an intelligent system. Large language models achieve above-c...
Large language models (LLMs) have shown remarkable reasoning capabilities given chain-of-thought pro...
Language models have achieved remarkable performance on a wide range of tasks that require natural l...
Logical reasoning consistently plays a fundamental and significant role in the domains of knowledge ...
Thesis (Ph.D.)--University of Washington, 2020For machines to understand language, they must intuiti...
Recent years have brought about a renewed interest in commonsense representation and reasoning in th...
Conceptualization, or viewing entities and situations as instances of abstract concepts in mind and ...
Natural Language Inference (NLI) is considered a representative task to test natural language unders...
The development of highly fluent large language models (LLMs) has prompted increased interest in ass...
Humans understand language by extracting information (meaning) from sentences, combining it with exi...
Large Language Models (LLMs) have demonstrated remarkable performance on various quantitative reason...
Contextualized representations trained over large raw text data have given remarkable improvements f...
Abstract Large language models (LLMs) such as GPT-4 have recently demonstrated impressive results ac...
Human beings are known for their remarkable ability to comprehend, analyze, and interpret commonsens...
This thesis develops formal computational models of intuitive theories, in particular intuitive phys...
Abstract reasoning is a key ability for an intelligent system. Large language models achieve above-c...
Large language models (LLMs) have shown remarkable reasoning capabilities given chain-of-thought pro...