例外、实例和过度泛化：洞察语言模型如何处理泛型

IF 5.3 2区计算机科学

Computational Linguistics Pub Date : 2024-07-30 DOI:10.1162/coli_a_00530

Emily Allaway, Chandra Bhagavatula, Jena D. Hwang, Kathleen McKeown, Sarah-Jane Leslie

{"title":"例外、实例和过度泛化：洞察语言模型如何处理泛型","authors":"Emily Allaway, Chandra Bhagavatula, Jena D. Hwang, Kathleen McKeown, Sarah-Jane Leslie","doi":"10.1162/coli_a_00530","DOIUrl":null,"url":null,"abstract":"Large language models (LLMs) have garnered a great deal of attention for their exceptional generative performance on commonsense and reasoning tasks. In this work, we investigate LLMs’ capabilities for generalization using a particularly challenging type of statement: generics. Generics express generalizations (e.g., birds can fly) but do so without explicit quantification. They are notable because they generalize over their instantiations (e.g., sparrows can fly) yet hold true even in the presence of exceptions (e.g., penguins do not). For humans, these generic generalization play a fundamental role in cognition, concept acquisition, and intuitive reasoning. We investigate how LLMs respond to and reason about generics. To this end, we first propose a framework grounded in pragmatics to automatically generate both exceptions and instantiations – collectively exemplars. We make use of focus – a pragmatic phenomenon that highlights meaning-bearing elements in a sentence – to capture the full range of interpretations of generics across different contexts of use. This allows us to derive precise logical definitions for exemplars and operationalize them to automatically generate exemplars from LLMs. Using our system, we generate a dataset of ∼370k exemplars across ∼17k generics and conduct a human validation of a sample of the generated data. We use our final generated dataset to investigate how LLMs’ reason about generics. Humans have a documented tendency to conflate universally quantified statements (e.g., all birds can fly) with generics. Therefore, we probe whether LLMs exhibit similar overgeneralization behavior in terms of quantification and in property inheritance. We find that LLMs do show evidence of overgeneralization, although they sometimes struggle to reason about exceptions. Furthermore, we find that LLMs may exhibit similar non-logical behavior to humans when considering property inheritance from generics.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"7 1","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exceptions, Instantiations, and Overgeneralization: Insights into How Language Models Process Generics\",\"authors\":\"Emily Allaway, Chandra Bhagavatula, Jena D. Hwang, Kathleen McKeown, Sarah-Jane Leslie\",\"doi\":\"10.1162/coli_a_00530\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large language models (LLMs) have garnered a great deal of attention for their exceptional generative performance on commonsense and reasoning tasks. In this work, we investigate LLMs’ capabilities for generalization using a particularly challenging type of statement: generics. Generics express generalizations (e.g., birds can fly) but do so without explicit quantification. They are notable because they generalize over their instantiations (e.g., sparrows can fly) yet hold true even in the presence of exceptions (e.g., penguins do not). For humans, these generic generalization play a fundamental role in cognition, concept acquisition, and intuitive reasoning. We investigate how LLMs respond to and reason about generics. To this end, we first propose a framework grounded in pragmatics to automatically generate both exceptions and instantiations – collectively exemplars. We make use of focus – a pragmatic phenomenon that highlights meaning-bearing elements in a sentence – to capture the full range of interpretations of generics across different contexts of use. This allows us to derive precise logical definitions for exemplars and operationalize them to automatically generate exemplars from LLMs. Using our system, we generate a dataset of ∼370k exemplars across ∼17k generics and conduct a human validation of a sample of the generated data. We use our final generated dataset to investigate how LLMs’ reason about generics. Humans have a documented tendency to conflate universally quantified statements (e.g., all birds can fly) with generics. Therefore, we probe whether LLMs exhibit similar overgeneralization behavior in terms of quantification and in property inheritance. We find that LLMs do show evidence of overgeneralization, although they sometimes struggle to reason about exceptions. Furthermore, we find that LLMs may exhibit similar non-logical behavior to humans when considering property inheritance from generics.\",\"PeriodicalId\":49089,\"journal\":{\"name\":\"Computational Linguistics\",\"volume\":\"7 1\",\"pages\":\"\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2024-07-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Linguistics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1162/coli_a_00530\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Linguistics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1162/coli_a_00530","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

大语言模型（LLM）因其在常识和推理任务中出色的生成性能而备受关注。在这项工作中，我们使用一种特别具有挑战性的语句类型--泛型--来研究 LLM 的泛化能力。泛型表达泛化（例如，鸟会飞），但没有明确的量化。它们之所以值得注意，是因为它们对其实例（如麻雀会飞）进行了概括，但即使存在例外情况（如企鹅不会飞），它们也仍然成立。对于人类来说，这些一般概括在认知、概念获取和直觉推理中发挥着根本性的作用。我们将研究 LLMs 如何对泛型做出反应并进行推理。为此，我们首先提出了一个以语用学为基础的框架，用于自动生成例外和实例--统称为范例。我们利用重点--一种突出句子中含意义元素的语用现象--来捕捉属词在不同使用语境中的各种解释。这样，我们就能为示例推导出精确的逻辑定义，并将其操作化，从而从 LLM 自动生成示例。利用我们的系统，我们生成了一个包含 ∼37 万个示例的数据集，涉及 ∼17 万个属词，并对生成的数据样本进行了人工验证。我们使用最终生成的数据集来研究 LLMs 如何推理类属。有文献记载，人类倾向于将普遍量化的陈述（例如，所有的鸟都会飞）与类属混为一谈。因此，我们探究 LLMs 在量化和属性继承方面是否表现出类似的过度泛化行为。我们发现，LLMs 确实表现出了过度泛化的迹象，尽管它们有时在推理例外情况时会遇到困难。此外，我们还发现，在考虑从属类继承属性时，LLMs 可能会表现出与人类类似的非逻辑行为。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exceptions, Instantiations, and Overgeneralization: Insights into How Language Models Process Generics

Large language models (LLMs) have garnered a great deal of attention for their exceptional generative performance on commonsense and reasoning tasks. In this work, we investigate LLMs’ capabilities for generalization using a particularly challenging type of statement: generics. Generics express generalizations (e.g., birds can fly) but do so without explicit quantification. They are notable because they generalize over their instantiations (e.g., sparrows can fly) yet hold true even in the presence of exceptions (e.g., penguins do not). For humans, these generic generalization play a fundamental role in cognition, concept acquisition, and intuitive reasoning. We investigate how LLMs respond to and reason about generics. To this end, we first propose a framework grounded in pragmatics to automatically generate both exceptions and instantiations – collectively exemplars. We make use of focus – a pragmatic phenomenon that highlights meaning-bearing elements in a sentence – to capture the full range of interpretations of generics across different contexts of use. This allows us to derive precise logical definitions for exemplars and operationalize them to automatically generate exemplars from LLMs. Using our system, we generate a dataset of ∼370k exemplars across ∼17k generics and conduct a human validation of a sample of the generated data. We use our final generated dataset to investigate how LLMs’ reason about generics. Humans have a documented tendency to conflate universally quantified statements (e.g., all birds can fly) with generics. Therefore, we probe whether LLMs exhibit similar overgeneralization behavior in terms of quantification and in property inheritance. We find that LLMs do show evidence of overgeneralization, although they sometimes struggle to reason about exceptions. Furthermore, we find that LLMs may exhibit similar non-logical behavior to humans when considering property inheritance from generics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computational Linguistics Computer Science-Artificial Intelligence

自引率

0.00%

发文量

期刊介绍： Computational Linguistics is the longest-running publication devoted exclusively to the computational and mathematical properties of language and the design and analysis of natural language processing systems. This highly regarded quarterly offers university and industry linguists, computational linguists, artificial intelligence and machine learning investigators, cognitive scientists, speech specialists, and philosophers the latest information about the computational aspects of all the facets of research on language.