Cassandra L. Jacobs , Ryan J. Hubbard , Loïc Grobol , Kara D. Federmeier
{"title":"揭示句子加工中语义可预测性的模式","authors":"Cassandra L. Jacobs , Ryan J. Hubbard , Loïc Grobol , Kara D. Federmeier","doi":"10.1016/j.jml.2025.104653","DOIUrl":null,"url":null,"abstract":"<div><div>Psycholinguistic researchers often collect cloze probabilities in order to measure the predictability of upcoming words but have largely discarded the variability in the structure of responses people provide. This variability in the semantic structure of responses may be important for understanding selection during language production; however, it has proven difficult to model the semantic variability of participants’ responses, and thus upcoming semantic uncertainty. Recent advances in large language models (LLMs) permit us to approximate the degree of semantic variability in cloze responses, but most methods are restricted to symbolic or hand-crafted meaning representations. We show in two studies that Bayesian Gaussian mixture models can cluster LLM representations of participants’ responses and produce coherent, taxonomically similar clusters. We apply these clustering algorithms to response time data in a serial cloze task and show that the semantic structure of cloze responses influences how quickly people are able to provide a response. We show clear effects of semantic competition on production speed. In addition to providing novel operationalizations of what semantic competition might look like in the cloze task, we explain how this clustering method is extensible to other datasets and applications of interest to researchers of semantic processing in psycholinguistics.</div></div>","PeriodicalId":16493,"journal":{"name":"Journal of memory and language","volume":"144 ","pages":"Article 104653"},"PeriodicalIF":2.9000,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Uncovering patterns of semantic predictability in sentence processing\",\"authors\":\"Cassandra L. Jacobs , Ryan J. Hubbard , Loïc Grobol , Kara D. Federmeier\",\"doi\":\"10.1016/j.jml.2025.104653\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Psycholinguistic researchers often collect cloze probabilities in order to measure the predictability of upcoming words but have largely discarded the variability in the structure of responses people provide. This variability in the semantic structure of responses may be important for understanding selection during language production; however, it has proven difficult to model the semantic variability of participants’ responses, and thus upcoming semantic uncertainty. Recent advances in large language models (LLMs) permit us to approximate the degree of semantic variability in cloze responses, but most methods are restricted to symbolic or hand-crafted meaning representations. We show in two studies that Bayesian Gaussian mixture models can cluster LLM representations of participants’ responses and produce coherent, taxonomically similar clusters. We apply these clustering algorithms to response time data in a serial cloze task and show that the semantic structure of cloze responses influences how quickly people are able to provide a response. We show clear effects of semantic competition on production speed. In addition to providing novel operationalizations of what semantic competition might look like in the cloze task, we explain how this clustering method is extensible to other datasets and applications of interest to researchers of semantic processing in psycholinguistics.</div></div>\",\"PeriodicalId\":16493,\"journal\":{\"name\":\"Journal of memory and language\",\"volume\":\"144 \",\"pages\":\"Article 104653\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-05-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of memory and language\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0749596X25000464\",\"RegionNum\":1,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of memory and language","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0749596X25000464","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"LINGUISTICS","Score":null,"Total":0}
Uncovering patterns of semantic predictability in sentence processing
Psycholinguistic researchers often collect cloze probabilities in order to measure the predictability of upcoming words but have largely discarded the variability in the structure of responses people provide. This variability in the semantic structure of responses may be important for understanding selection during language production; however, it has proven difficult to model the semantic variability of participants’ responses, and thus upcoming semantic uncertainty. Recent advances in large language models (LLMs) permit us to approximate the degree of semantic variability in cloze responses, but most methods are restricted to symbolic or hand-crafted meaning representations. We show in two studies that Bayesian Gaussian mixture models can cluster LLM representations of participants’ responses and produce coherent, taxonomically similar clusters. We apply these clustering algorithms to response time data in a serial cloze task and show that the semantic structure of cloze responses influences how quickly people are able to provide a response. We show clear effects of semantic competition on production speed. In addition to providing novel operationalizations of what semantic competition might look like in the cloze task, we explain how this clustering method is extensible to other datasets and applications of interest to researchers of semantic processing in psycholinguistics.
期刊介绍:
Articles in the Journal of Memory and Language contribute to the formulation of scientific issues and theories in the areas of memory, language comprehension and production, and cognitive processes. Special emphasis is given to research articles that provide new theoretical insights based on a carefully laid empirical foundation. The journal generally favors articles that provide multiple experiments. In addition, significant theoretical papers without new experimental findings may be published.
The Journal of Memory and Language is a valuable tool for cognitive scientists, including psychologists, linguists, and others interested in memory and learning, language, reading, and speech.
Research Areas include:
• Topics that illuminate aspects of memory or language processing
• Linguistics
• Neuropsychology.