Dissociable frequency effects attenuate as large language model surprisal predictors improve

IF 3 1区心理学 Q1 LINGUISTICS

Journal of memory and language Pub Date : 2025-05-14 DOI:10.1016/j.jml.2025.104645

Byung-Doh Oh , William Schuler

{"title":"Dissociable frequency effects attenuate as large language model surprisal predictors improve","authors":"Byung-Doh Oh , William Schuler","doi":"10.1016/j.jml.2025.104645","DOIUrl":null,"url":null,"abstract":"<div><div>Recent psycholinguistic modeling work using surprisal from Transformer-based language models has reported separable effects of frequency and predictability on real-time processing difficulty. However, it has also been shown that as Transformer-based language models become larger and are trained on more data, they are able to predict low-frequency words more accurately, which has a deleterious effect on fit to reading times. This article examines the impact of this property of language models on the dissociability of frequency effects and predictability effects in naturalistic reading. Regression results show robust positive effects of language model size and training data amount on the ability of word frequency to explain variance in held-out reading times as the contribution due to surprisal declines, which suggests a strong compensatory relationship between frequency and language model surprisal. Additionally, an analysis of the learning trajectories of low-frequency tokens reveals that the influence of model size is strongest on the prediction of tokens that are not part of a bigram sequence observed earlier in the context that models can readily copy, which suggests that limitations in model size create pressures toward learning more general associations. Taken together, these results suggest that the observed frequency effects may be due to imperfect estimates of predictability, and may disappear entirely as better-fitting language models are discovered. This further highlights the importance of exploring additional language models as models of human sentence processing.</div></div>","PeriodicalId":16493,"journal":{"name":"Journal of memory and language","volume":"143 ","pages":"Article 104645"},"PeriodicalIF":3.0000,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of memory and language","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0749596X25000385","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"LINGUISTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Recent psycholinguistic modeling work using surprisal from Transformer-based language models has reported separable effects of frequency and predictability on real-time processing difficulty. However, it has also been shown that as Transformer-based language models become larger and are trained on more data, they are able to predict low-frequency words more accurately, which has a deleterious effect on fit to reading times. This article examines the impact of this property of language models on the dissociability of frequency effects and predictability effects in naturalistic reading. Regression results show robust positive effects of language model size and training data amount on the ability of word frequency to explain variance in held-out reading times as the contribution due to surprisal declines, which suggests a strong compensatory relationship between frequency and language model surprisal. Additionally, an analysis of the learning trajectories of low-frequency tokens reveals that the influence of model size is strongest on the prediction of tokens that are not part of a bigram sequence observed earlier in the context that models can readily copy, which suggests that limitations in model size create pressures toward learning more general associations. Taken together, these results suggest that the observed frequency effects may be due to imperfect estimates of predictability, and may disappear entirely as better-fitting language models are discovered. This further highlights the importance of exploring additional language models as models of human sentence processing.

查看原文本刊更多论文

随着大型语言模型的惊人预测器的改进，可分离的频率效应减弱

最近的心理语言学建模工作使用了基于transformer的语言模型的surprisal，报告了频率和可预测性对实时处理难度的可分离影响。然而，也有研究表明，随着基于transformer的语言模型变得越来越大，并且接受了更多数据的训练，它们能够更准确地预测低频词，这对适合阅读时间有有害的影响。本文考察了语言模型的这一特性对自然阅读中频率效应的可解离性和可预测性效应的影响。回归结果显示，语言模型大小和训练数据量对词频解释持续阅读时间差异的能力有显著的正向影响，这表明词频和语言模型惊喜之间存在很强的补偿关系。此外，对低频标记的学习轨迹的分析表明，模型大小对不属于二元序列的标记的预测的影响是最大的，这些标记在之前的环境中观察到，模型可以很容易地复制，这表明模型大小的限制给学习更一般的关联带来了压力。综上所述，这些结果表明，观察到的频率效应可能是由于对可预测性的估计不完善，并且可能随着更好的拟合语言模型的发现而完全消失。这进一步强调了探索其他语言模型作为人类句子处理模型的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of memory and language 医学-心理学

CiteScore

8.70

自引率

14.00%

发文量

审稿时长

12.7 weeks

期刊介绍： Articles in the Journal of Memory and Language contribute to the formulation of scientific issues and theories in the areas of memory, language comprehension and production, and cognitive processes. Special emphasis is given to research articles that provide new theoretical insights based on a carefully laid empirical foundation. The journal generally favors articles that provide multiple experiments. In addition, significant theoretical papers without new experimental findings may be published. The Journal of Memory and Language is a valuable tool for cognitive scientists, including psychologists, linguists, and others interested in memory and learning, language, reading, and speech. Research Areas include: • Topics that illuminate aspects of memory or language processing • Linguistics • Neuropsychology.