{"title":"Dissociable frequency effects attenuate as large language model surprisal predictors improve","authors":"Byung-Doh Oh , William Schuler","doi":"10.1016/j.jml.2025.104645","DOIUrl":null,"url":null,"abstract":"<div><div>Recent psycholinguistic modeling work using surprisal from Transformer-based language models has reported separable effects of frequency and predictability on real-time processing difficulty. However, it has also been shown that as Transformer-based language models become larger and are trained on more data, they are able to predict low-frequency words more accurately, which has a deleterious effect on fit to reading times. This article examines the impact of this property of language models on the dissociability of frequency effects and predictability effects in naturalistic reading. Regression results show robust positive effects of language model size and training data amount on the ability of word frequency to explain variance in held-out reading times as the contribution due to surprisal declines, which suggests a strong compensatory relationship between frequency and language model surprisal. Additionally, an analysis of the learning trajectories of low-frequency tokens reveals that the influence of model size is strongest on the prediction of tokens that are not part of a bigram sequence observed earlier in the context that models can readily copy, which suggests that limitations in model size create pressures toward learning more general associations. Taken together, these results suggest that the observed frequency effects may be due to imperfect estimates of predictability, and may disappear entirely as better-fitting language models are discovered. This further highlights the importance of exploring additional language models as models of human sentence processing.</div></div>","PeriodicalId":16493,"journal":{"name":"Journal of memory and language","volume":"143 ","pages":"Article 104645"},"PeriodicalIF":2.9000,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of memory and language","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0749596X25000385","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"LINGUISTICS","Score":null,"Total":0}
引用次数: 0
Abstract
Recent psycholinguistic modeling work using surprisal from Transformer-based language models has reported separable effects of frequency and predictability on real-time processing difficulty. However, it has also been shown that as Transformer-based language models become larger and are trained on more data, they are able to predict low-frequency words more accurately, which has a deleterious effect on fit to reading times. This article examines the impact of this property of language models on the dissociability of frequency effects and predictability effects in naturalistic reading. Regression results show robust positive effects of language model size and training data amount on the ability of word frequency to explain variance in held-out reading times as the contribution due to surprisal declines, which suggests a strong compensatory relationship between frequency and language model surprisal. Additionally, an analysis of the learning trajectories of low-frequency tokens reveals that the influence of model size is strongest on the prediction of tokens that are not part of a bigram sequence observed earlier in the context that models can readily copy, which suggests that limitations in model size create pressures toward learning more general associations. Taken together, these results suggest that the observed frequency effects may be due to imperfect estimates of predictability, and may disappear entirely as better-fitting language models are discovered. This further highlights the importance of exploring additional language models as models of human sentence processing.
期刊介绍:
Articles in the Journal of Memory and Language contribute to the formulation of scientific issues and theories in the areas of memory, language comprehension and production, and cognitive processes. Special emphasis is given to research articles that provide new theoretical insights based on a carefully laid empirical foundation. The journal generally favors articles that provide multiple experiments. In addition, significant theoretical papers without new experimental findings may be published.
The Journal of Memory and Language is a valuable tool for cognitive scientists, including psychologists, linguists, and others interested in memory and learning, language, reading, and speech.
Research Areas include:
• Topics that illuminate aspects of memory or language processing
• Linguistics
• Neuropsychology.