Beyond Markov: Transformers, memory, and attention.

IF 2.2 4区医学 Q3 NEUROSCIENCES

Cognitive Neuroscience Pub Date : 2025-04-15 DOI:10.1080/17588928.2025.2484485

Thomas Parr, Giovanni Pezzulo, Karl Friston

{"title":"Beyond Markov: Transformers, memory, and attention.","authors":"Thomas Parr, Giovanni Pezzulo, Karl Friston","doi":"10.1080/17588928.2025.2484485","DOIUrl":null,"url":null,"abstract":"<p><p>This paper asks what predictive processing models of brain function can learn from the success of transformer architectures. We suggest that the reason transformer architectures have been successful is that they implicitly commit to a non-Markovian generative model - in which we need memory to contextualize our current observations and make predictions about the future. Interestingly, both the notions of working memory in cognitive science and transformer architectures rely heavily upon the concept of attention. We will argue that the move beyond Markov is crucial in the construction of generative models capable of dealing with much of the sequential data - and certainly language - that our brains contend with. We characterize two broad approaches to this problem - deep temporal hierarchies and autoregressive models - with transformers being an example of the latter. Our key conclusions are that transformers benefit heavily from their use of embedding spaces that place strong metric priors on an implicit latent variable and utilize this metric to direct a form of attention that highlights the most relevant, and not only the most recent, previous elements in a sequence to help predict the next.</p>","PeriodicalId":10413,"journal":{"name":"Cognitive Neuroscience","volume":" ","pages":"1-19"},"PeriodicalIF":2.2000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Neuroscience","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/17588928.2025.2484485","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"NEUROSCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

This paper asks what predictive processing models of brain function can learn from the success of transformer architectures. We suggest that the reason transformer architectures have been successful is that they implicitly commit to a non-Markovian generative model - in which we need memory to contextualize our current observations and make predictions about the future. Interestingly, both the notions of working memory in cognitive science and transformer architectures rely heavily upon the concept of attention. We will argue that the move beyond Markov is crucial in the construction of generative models capable of dealing with much of the sequential data - and certainly language - that our brains contend with. We characterize two broad approaches to this problem - deep temporal hierarchies and autoregressive models - with transformers being an example of the latter. Our key conclusions are that transformers benefit heavily from their use of embedding spaces that place strong metric priors on an implicit latent variable and utilize this metric to direct a form of attention that highlights the most relevant, and not only the most recent, previous elements in a sequence to help predict the next.

查看原文本刊更多论文

超越马尔科夫：变形金刚、记忆和注意力。

本文想知道大脑功能的预测处理模型可以从变压器结构的成功中学到什么。我们认为，转换器架构成功的原因是它们隐含地承诺了一个非马尔可夫生成模型——在这个模型中，我们需要内存来将我们当前的观察和对未来的预测联系起来。有趣的是，认知科学中的工作记忆概念和变形结构都严重依赖于注意力的概念。我们将论证，超越马尔可夫定理对于生成模型的构建至关重要，生成模型能够处理我们大脑所处理的大量序列数据——当然还有语言。我们描述了两种广泛的方法来解决这个问题-深度时间层次和自回归模型-变压器是后者的一个例子。我们的主要结论是，变压器从嵌入空间的使用中获益良多，这些嵌入空间将强大的度量先验置于隐式潜在变量上，并利用该度量来指导一种形式的注意力，突出显示最相关的，而不仅仅是最近的，序列中的先前元素，以帮助预测下一个。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Cognitive Neuroscience NEUROSCIENCES-

CiteScore

3.60

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Cognitive Neuroscience publishes high quality discussion papers and empirical papers on any topic in the field of cognitive neuroscience including perception, attention, memory, language, action, social cognition, and executive function. The journal covers findings based on a variety of techniques such as fMRI, ERPs, MEG, TMS, and focal lesion studies. Contributions that employ or discuss multiple techniques to shed light on the spatial-temporal brain mechanisms underlying a cognitive process are encouraged.