Large language models (LLMs) have shown remarkable reasoning capabilities given chain-of-thought prompts (examples with intermediate reasoning steps). Existing benchmarks measure reasoning ability indirectly, by evaluating accuracy on downstream tasks such as mathematical reasoning. However, it is unclear how these models obtain the answers and whether they rely on simple heuristics rather than the generated chain-of-thought. To enable systematic exploration of the reasoning ability of LLMs, we present a new synthetic question-answering dataset called PrOntoQA, where each example is generated from a synthetic world model represented in first-order logic. This allows us to parse the generated chain-of-thought into symbolic proofs for formal ...
Humans understand language by extracting information (meaning) from sentences, combining it with exi...
Abstract Large language models (LLMs) such as GPT-4 have recently demonstrated impressive results ac...
Abstract reasoning is a key ability for an intelligent system. Large language models achieve above-c...
Logical reasoning remains a pivotal component within the realm of artificial intelligence. The recen...
We explore how generating a chain of thought -- a series of intermediate reasoning steps -- signific...
Large language models (LLMs) can perform complex reasoning by generating intermediate reasoning step...
Large language models (LLMs) have a substantial capacity for high-level analogical reasoning: reprod...
Logical reasoning consistently plays a fundamental and significant role in the domains of knowledge ...
Neural-symbolic methods have shown their effectiveness in enhancing the reasoning abilities of large...
Large language models (LMs) beyond a certain scale, demonstrate the emergent capability of generatin...
Emergent chain-of-thought (CoT) reasoning capabilities promise to improve performance and explainabi...
Pretrained large language models (LLMs) are widely used in many sub-fields of natural language proce...
To augment language models with the ability to reason, researchers usually prompt or finetune them t...
One way that the current state of the art measures the reasoning ability of transformer-based models...
Large language models (LLMs) have gained enormous attention from both academia and industry, due to ...
Humans understand language by extracting information (meaning) from sentences, combining it with exi...
Abstract Large language models (LLMs) such as GPT-4 have recently demonstrated impressive results ac...
Abstract reasoning is a key ability for an intelligent system. Large language models achieve above-c...
Logical reasoning remains a pivotal component within the realm of artificial intelligence. The recen...
We explore how generating a chain of thought -- a series of intermediate reasoning steps -- signific...
Large language models (LLMs) can perform complex reasoning by generating intermediate reasoning step...
Large language models (LLMs) have a substantial capacity for high-level analogical reasoning: reprod...
Logical reasoning consistently plays a fundamental and significant role in the domains of knowledge ...
Neural-symbolic methods have shown their effectiveness in enhancing the reasoning abilities of large...
Large language models (LMs) beyond a certain scale, demonstrate the emergent capability of generatin...
Emergent chain-of-thought (CoT) reasoning capabilities promise to improve performance and explainabi...
Pretrained large language models (LLMs) are widely used in many sub-fields of natural language proce...
To augment language models with the ability to reason, researchers usually prompt or finetune them t...
One way that the current state of the art measures the reasoning ability of transformer-based models...
Large language models (LLMs) have gained enormous attention from both academia and industry, due to ...
Humans understand language by extracting information (meaning) from sentences, combining it with exi...
Abstract Large language models (LLMs) such as GPT-4 have recently demonstrated impressive results ac...
Abstract reasoning is a key ability for an intelligent system. Large language models achieve above-c...