Transformer Models for Signal Processing: Scaled Dot-Product Attention Implements Constrained Filtering

IF 2.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation Pub Date : 2025-09-22 DOI:10.1162/neco.a.29

Terence D. Sanger

引用次数: 0

Abstract

The remarkable success of the transformer machine learning architecture for processing language sequences far exceeds the performance of classical signal processing methods. A unique component of transformer models is the scaled dot-product attention (SDPA) layer, which does not appear to have an analog in prior signal processing algorithms. Here, we show that SDPA operates using a novel principle that projects the current state estimate onto the space spanned by prior estimates. We show that SDPA, when used for causal recursive state estimation, implements constrained state estimation in circumstances where the constraint is unknown and may be time varying. Since constraints in high-dimensional space may represent the complex relationships that define nonlinear signals and models, this suggests that the SDPA layer and transformer models leverage constrained estimation to achieve their success. This also suggests that transformers and the SPDA layer could be a computational model for previously unexplained capabilities of human behavior.

查看原文本刊更多论文

用于信号处理的变压器模型：缩放点积注意实现约束滤波。

变压器机器学习架构在处理语言序列方面的显著成功远远超过了经典信号处理方法的性能。变压器模型的一个独特组成部分是按比例的点积注意（SDPA）层，它在先前的信号处理算法中似乎没有类比。在这里，我们展示了SDPA使用一种新的原理来操作，该原理将当前状态估计投影到先前估计所跨越的空间上。我们表明，当用于因果递归状态估计时，SDPA在约束未知且可能随时间变化的情况下实现了约束状态估计。由于高维空间中的约束可能表示定义非线性信号和模型的复杂关系，这表明SDPA层和变压器模型利用约束估计来实现它们的成功。这也表明，变压器和SPDA层可能是以前无法解释的人类行为能力的计算模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Computation 工程技术-计算机：人工智能

CiteScore

6.30

自引率

3.40%

发文量

审稿时长

3.0 months

期刊介绍： Neural Computation is uniquely positioned at the crossroads between neuroscience and TMCS and welcomes the submission of original papers from all areas of TMCS, including: Advanced experimental design; Analysis of chemical sensor data; Connectomic reconstructions; Analysis of multielectrode and optical recordings; Genetic data for cell identity; Analysis of behavioral data; Multiscale models; Analysis of molecular mechanisms; Neuroinformatics; Analysis of brain imaging data; Neuromorphic engineering; Principles of neural coding, computation, circuit dynamics, and plasticity; Theories of brain function.