Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations

arXiv - CS - Neural and Evolutionary Computing Pub Date : 2024-08-20 DOI:arxiv-2408.10920

Róbert Csordás, Christopher Potts, Christopher D. Manning, Atticus Geiger

引用次数: 0

Abstract

The Linear Representation Hypothesis (LRH) states that neural networks learn to encode concepts as directions in activation space, and a strong version of the LRH states that models learn only such encodings. In this paper, we present a counterexample to this strong LRH: when trained to repeat an input token sequence, gated recurrent neural networks (RNNs) learn to represent the token at each position with a particular order of magnitude, rather than a direction. These representations have layered features that are impossible to locate in distinct linear subspaces. To show this, we train interventions to predict and manipulate tokens by learning the scaling factor corresponding to each sequence position. These interventions indicate that the smallest RNNs find only this magnitude-based solution, while larger RNNs have linear representations. These findings strongly indicate that interpretability research should not be confined by the LRH.

查看原文本刊更多论文

递归神经网络利用非线性表征学习存储和生成序列

线性表征假说（Larine Representation Hypothesis，LRH）指出，神经网络学习将概念编码为激活空间中的方向，而 LRH 的强版本指出，模型只学习这样的编码。在本文中，我们提出了这个强 LRH 的反例：当训练重复输入的标记序列时，门控递归神经网络（RNN）会学习用特定的数量级而不是方向来表示每个位置上的标记。为了说明这一点，我们通过学习与每个序列位置相对应的缩放因子来训练预测和操纵标记的干预。这些干预表明，最小的 RNN 只能找到这种基于幅度的解决方案，而较大的 RNN 则具有线性表征。这些发现有力地表明，可解释性研究不应受限于 LRH。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Neural and Evolutionary Computing

自引率

0.00%

发文量