Dissociating model architectures from inference computations.

IF 2 4区医学 Q3 NEUROSCIENCES

Cognitive Neuroscience Pub Date : 2025-07-17 DOI:10.1080/17588928.2025.2532604

Noor Sajid, Johan Medrano

引用次数: 0

Abstract

Parr et al., 2025 examines how auto-regressive and deep temporal models differ in their treatment of non-Markovian sequence modelling. Building on this, we highlight the need for dissociating model architectures-i.e., how the predictive distribution factorises-from the computations invoked at inference. We demonstrate that deep temporal computations are mimicked by autoregressive models by structuring context access during iterative inference. Using a transformer trained on next-token prediction, we show that inducing hierarchical temporal factorisation during iterative inference maintains predictive capacity while instantiating fewer computations. This emphasises that processes for constructing and refining predictions are not necessarily bound to their underlying model architectures.

查看原文本刊更多论文

从推理计算中分离模型架构。

Parr等人，2025研究了自回归模型和深度时间模型在处理非马尔可夫序列模型方面的差异。在此基础上，我们强调了分离模型体系结构的必要性。预测分布是如何从推理中调用的计算中分解出来的。我们证明了深度时间计算是由自回归模型通过在迭代推理过程中构建上下文访问来模拟的。使用在下一个令牌预测上训练的转换器，我们表明在迭代推理期间诱导分层时间分解保持预测能力，同时实例化更少的计算。这强调了构建和精炼预测的过程不一定绑定到它们的底层模型体系结构。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Cognitive Neuroscience NEUROSCIENCES-

CiteScore

3.60

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Cognitive Neuroscience publishes high quality discussion papers and empirical papers on any topic in the field of cognitive neuroscience including perception, attention, memory, language, action, social cognition, and executive function. The journal covers findings based on a variety of techniques such as fMRI, ERPs, MEG, TMS, and focal lesion studies. Contributions that employ or discuss multiple techniques to shed light on the spatial-temporal brain mechanisms underlying a cognitive process are encouraged.