Planning to be incremental: Scene descriptions reveal meaningful clustering in language production

IF 2.8 1区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Cognition Pub Date : 2025-09-22 DOI:10.1016/j.cognition.2025.106330

Karina Tachihara , Madison Barker , Beverly Cotter , Taylor Hayes , John Henderson , Adrian Zhou , Fernanda Ferreira

{"title":"Planning to be incremental: Scene descriptions reveal meaningful clustering in language production","authors":"Karina Tachihara , Madison Barker , Beverly Cotter , Taylor Hayes , John Henderson , Adrian Zhou , Fernanda Ferreira","doi":"10.1016/j.cognition.2025.106330","DOIUrl":null,"url":null,"abstract":"<div><div>How do speakers plan complex descriptions and then execute those plans? In this work, we attempt to answer this question by asking subjects to describe complex visual scenes. We posit that speakers begin planning by organizing the scene into meaningful clusters or groupings of objects. Speakers describe the scene cluster by cluster, allowing for some planning time between each cluster. To test these ideas, in a preregistered study 30 participants described 30 indoor and outdoor scenes while their speech was recorded. Physical distance was calculated by identifying the centroid point of each object and then computing the Euclidean distance between centroid points for every object pair. Semantic distance was calculated using ConceptNet Numberbatch to obtain the semantic similarity between object labels. A clustering algorithm was then applied to establish the appropriate number of clusters per scene and to assign objects to each cluster. We observed that, consistent with our hypothesis, objects separated by shorter physical distances and objects that are semantically more similar were discussed in closer temporal proximity in the verbal descriptions. In addition, word productions that involved jumping from one cluster to another took longer to initiate than those associated with the same cluster. We conclude that speakers address the linearization problem by establishing clusters of objects and using them to facilitate incremental planning. This approach treats multiutterance language production as a type of foraging behavior, where people balance exploration and exploitation.</div></div>","PeriodicalId":48455,"journal":{"name":"Cognition","volume":"266 ","pages":"Article 106330"},"PeriodicalIF":2.8000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognition","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010027725002719","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

How do speakers plan complex descriptions and then execute those plans? In this work, we attempt to answer this question by asking subjects to describe complex visual scenes. We posit that speakers begin planning by organizing the scene into meaningful clusters or groupings of objects. Speakers describe the scene cluster by cluster, allowing for some planning time between each cluster. To test these ideas, in a preregistered study 30 participants described 30 indoor and outdoor scenes while their speech was recorded. Physical distance was calculated by identifying the centroid point of each object and then computing the Euclidean distance between centroid points for every object pair. Semantic distance was calculated using ConceptNet Numberbatch to obtain the semantic similarity between object labels. A clustering algorithm was then applied to establish the appropriate number of clusters per scene and to assign objects to each cluster. We observed that, consistent with our hypothesis, objects separated by shorter physical distances and objects that are semantically more similar were discussed in closer temporal proximity in the verbal descriptions. In addition, word productions that involved jumping from one cluster to another took longer to initiate than those associated with the same cluster. We conclude that speakers address the linearization problem by establishing clusters of objects and using them to facilitate incremental planning. This approach treats multiutterance language production as a type of foraging behavior, where people balance exploration and exploitation.

查看原文本刊更多论文

计划增量：场景描述揭示了语言生产中有意义的聚类

演讲者是如何计划复杂的描述并执行这些计划的？在这项工作中，我们试图通过要求受试者描述复杂的视觉场景来回答这个问题。我们假设说话者通过将场景组织成有意义的集群或对象分组来开始规划。演讲者一个集群一个集群地描述场景，在每个集群之间留出一些规划时间。为了验证这些想法，在一项预先登记的研究中，30名参与者描述了30个室内和室外场景，同时记录了他们的演讲。物理距离是通过识别每个物体的质心点，然后计算每个物体对的质心点之间的欧氏距离来计算的。使用ConceptNet Numberbatch计算语义距离，获得对象标签之间的语义相似度。然后应用聚类算法建立每个场景的适当数量的聚类，并将对象分配到每个聚类。我们观察到，与我们的假设一致，物理距离较短的物体和语义上更相似的物体在口头描述中被讨论的时间更近。此外，涉及从一个集群跳到另一个集群的单词生成比与同一集群相关的单词生成启动时间更长。我们得出结论，演讲者通过建立对象集群并使用它们来促进增量规划来解决线性化问题。这种方法将多语语言的产生视为一种觅食行为，人们在这种行为中平衡了探索和利用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Cognition PSYCHOLOGY, EXPERIMENTAL-

CiteScore

6.40

自引率

5.90%

发文量

283

期刊介绍： Cognition is an international journal that publishes theoretical and experimental papers on the study of the mind. It covers a wide variety of subjects concerning all the different aspects of cognition, ranging from biological and experimental studies to formal analysis. Contributions from the fields of psychology, neuroscience, linguistics, computer science, mathematics, ethology and philosophy are welcome in this journal provided that they have some bearing on the functioning of the mind. In addition, the journal serves as a forum for discussion of social and political aspects of cognitive science.