Memory for prediction: A Transformer-based theory of sentence processing

IF 3 1区心理学 Q1 LINGUISTICS

Journal of memory and language Pub Date : 2025-07-25 DOI:10.1016/j.jml.2025.104670

Soo Hyun Ryu , Richard L. Lewis

{"title":"Memory for prediction: A Transformer-based theory of sentence processing","authors":"Soo Hyun Ryu , Richard L. Lewis","doi":"10.1016/j.jml.2025.104670","DOIUrl":null,"url":null,"abstract":"<div><div>We demonstrate that Transformer-based neural network language models provide a new foundation for mechanistic theories of sentence processing that seamlessly integrate expectation-based and memory-based accounts. First, we show that the attention mechanism in GPT2-small operates as a kind of cue-based retrieval architecture that is subject to similarity-based interference. Second, we show that it provides accounts of classic memory effects in parsing, including contrasts involving relative clauses and center-embedding. Third, we show that a simple word-by-word entropy metric computed over the internal attention patterns provides an index of memory interference that explains variance in eye-tracking and self-paced reading time measures (independent of surprisal and other predictors) in two natural story reading time corpora. Because the cues and representations are learned, there is no need for the theorist to postulate representational features and cues. Transformers provide practical modeling tools for exploring the effects of memory and experience, given the increasing availability of both pre-trained models and software for training new models, and the ease with which surprisal and attention entropy metrics may be computed.</div></div>","PeriodicalId":16493,"journal":{"name":"Journal of memory and language","volume":"145 ","pages":"Article 104670"},"PeriodicalIF":3.0000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of memory and language","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0749596X25000634","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"LINGUISTICS","Score":null,"Total":0}

引用次数: 0

Abstract

We demonstrate that Transformer-based neural network language models provide a new foundation for mechanistic theories of sentence processing that seamlessly integrate expectation-based and memory-based accounts. First, we show that the attention mechanism in GPT2-small operates as a kind of cue-based retrieval architecture that is subject to similarity-based interference. Second, we show that it provides accounts of classic memory effects in parsing, including contrasts involving relative clauses and center-embedding. Third, we show that a simple word-by-word entropy metric computed over the internal attention patterns provides an index of memory interference that explains variance in eye-tracking and self-paced reading time measures (independent of surprisal and other predictors) in two natural story reading time corpora. Because the cues and representations are learned, there is no need for the theorist to postulate representational features and cues. Transformers provide practical modeling tools for exploring the effects of memory and experience, given the increasing availability of both pre-trained models and software for training new models, and the ease with which surprisal and attention entropy metrics may be computed.

查看原文本刊更多论文

预测记忆：基于变换的句子处理理论

我们证明了基于transformer的神经网络语言模型为句子处理的机制理论提供了新的基础，该理论无缝地集成了基于期望和基于记忆的帐户。首先，我们发现GPT2-small中的注意机制是一种基于线索的检索架构，它会受到基于相似性的干扰。其次，我们表明它提供了解析中经典记忆效应的解释，包括涉及相对分句和中心嵌入的对比。第三，我们展示了在内部注意模式上计算的一个简单的逐字熵度量提供了一个记忆干扰指数，解释了在两个自然故事阅读时间语料库中眼动追踪和自定节奏阅读时间测量（独立于惊奇和其他预测因素）的差异。因为线索和表征是习得的，所以理论家不需要假设表征特征和线索。变形金刚为探索记忆和经验的影响提供了实用的建模工具，考虑到预训练模型和训练新模型的软件的可用性不断增加，并且可以轻松地计算惊喜和注意力熵度量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of memory and language 医学-心理学

CiteScore

8.70

自引率

14.00%

发文量

审稿时长

12.7 weeks

期刊介绍： Articles in the Journal of Memory and Language contribute to the formulation of scientific issues and theories in the areas of memory, language comprehension and production, and cognitive processes. Special emphasis is given to research articles that provide new theoretical insights based on a carefully laid empirical foundation. The journal generally favors articles that provide multiple experiments. In addition, significant theoretical papers without new experimental findings may be published. The Journal of Memory and Language is a valuable tool for cognitive scientists, including psychologists, linguists, and others interested in memory and learning, language, reading, and speech. Research Areas include: • Topics that illuminate aspects of memory or language processing • Linguistics • Neuropsychology.