Memory for prediction: A Transformer-based theory of sentence processing

IF 3 1区 心理学 Q1 LINGUISTICS
Soo Hyun Ryu , Richard L. Lewis
{"title":"Memory for prediction: A Transformer-based theory of sentence processing","authors":"Soo Hyun Ryu ,&nbsp;Richard L. Lewis","doi":"10.1016/j.jml.2025.104670","DOIUrl":null,"url":null,"abstract":"<div><div>We demonstrate that Transformer-based neural network language models provide a new foundation for mechanistic theories of sentence processing that seamlessly integrate expectation-based and memory-based accounts. First, we show that the attention mechanism in GPT2-small operates as a kind of cue-based retrieval architecture that is subject to similarity-based interference. Second, we show that it provides accounts of classic memory effects in parsing, including contrasts involving relative clauses and center-embedding. Third, we show that a simple word-by-word entropy metric computed over the internal attention patterns provides an index of memory interference that explains variance in eye-tracking and self-paced reading time measures (independent of surprisal and other predictors) in two natural story reading time corpora. Because the cues and representations are learned, there is no need for the theorist to postulate representational features and cues. Transformers provide practical modeling tools for exploring the effects of memory and experience, given the increasing availability of both pre-trained models and software for training new models, and the ease with which surprisal and attention entropy metrics may be computed.</div></div>","PeriodicalId":16493,"journal":{"name":"Journal of memory and language","volume":"145 ","pages":"Article 104670"},"PeriodicalIF":3.0000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of memory and language","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0749596X25000634","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"LINGUISTICS","Score":null,"Total":0}
引用次数: 0

Abstract

We demonstrate that Transformer-based neural network language models provide a new foundation for mechanistic theories of sentence processing that seamlessly integrate expectation-based and memory-based accounts. First, we show that the attention mechanism in GPT2-small operates as a kind of cue-based retrieval architecture that is subject to similarity-based interference. Second, we show that it provides accounts of classic memory effects in parsing, including contrasts involving relative clauses and center-embedding. Third, we show that a simple word-by-word entropy metric computed over the internal attention patterns provides an index of memory interference that explains variance in eye-tracking and self-paced reading time measures (independent of surprisal and other predictors) in two natural story reading time corpora. Because the cues and representations are learned, there is no need for the theorist to postulate representational features and cues. Transformers provide practical modeling tools for exploring the effects of memory and experience, given the increasing availability of both pre-trained models and software for training new models, and the ease with which surprisal and attention entropy metrics may be computed.
预测记忆:基于变换的句子处理理论
我们证明了基于transformer的神经网络语言模型为句子处理的机制理论提供了新的基础,该理论无缝地集成了基于期望和基于记忆的帐户。首先,我们发现GPT2-small中的注意机制是一种基于线索的检索架构,它会受到基于相似性的干扰。其次,我们表明它提供了解析中经典记忆效应的解释,包括涉及相对分句和中心嵌入的对比。第三,我们展示了在内部注意模式上计算的一个简单的逐字熵度量提供了一个记忆干扰指数,解释了在两个自然故事阅读时间语料库中眼动追踪和自定节奏阅读时间测量(独立于惊奇和其他预测因素)的差异。因为线索和表征是习得的,所以理论家不需要假设表征特征和线索。变形金刚为探索记忆和经验的影响提供了实用的建模工具,考虑到预训练模型和训练新模型的软件的可用性不断增加,并且可以轻松地计算惊喜和注意力熵度量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
8.70
自引率
14.00%
发文量
49
审稿时长
12.7 weeks
期刊介绍: Articles in the Journal of Memory and Language contribute to the formulation of scientific issues and theories in the areas of memory, language comprehension and production, and cognitive processes. Special emphasis is given to research articles that provide new theoretical insights based on a carefully laid empirical foundation. The journal generally favors articles that provide multiple experiments. In addition, significant theoretical papers without new experimental findings may be published. The Journal of Memory and Language is a valuable tool for cognitive scientists, including psychologists, linguists, and others interested in memory and learning, language, reading, and speech. Research Areas include: • Topics that illuminate aspects of memory or language processing • Linguistics • Neuropsychology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信