Feature Normalization Prevents Collapse of Noncontrastive Learning Dynamics.

IF 2.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation Pub Date : 2025-10-10 DOI:10.1162/neco.a.27

Han Bao

{"title":"Feature Normalization Prevents Collapse of Noncontrastive Learning Dynamics.","authors":"Han Bao","doi":"10.1162/neco.a.27","DOIUrl":null,"url":null,"abstract":"<p><p>Contrastive learning is a self-supervised representation learning framework where two positive views generated through data augmentation are made similar by an attraction force in a data representation space, while a repulsive force makes them far from negative examples. Noncontrastive learning, represented by BYOL and SimSiam, gets rid of negative examples and improves computational efficiency. While learned representations may collapse into a single point due to the lack of the repulsive force at first sight, Tian et al. (2021) revealed through learning dynamics analysis that the representations can avoid collapse if data augmentation is sufficiently stronger than regularization. However, their analysis does not take into account commonly used feature normalization, a normalizer before measuring the similarity of representations, and hence excessively strong regularization may still collapse the dynamics, an unnatural behavior under the presence of feature normalization. Therefore, we extend the previous theory based on the L2 loss by considering the cosine loss instead, which involves feature normalization. We show that the cosine loss induces sixth-order dynamics (while the L2 loss induces a third-order one), in which a stable equilibrium dynamically emerges even if there are only collapsed solutions with given initial parameters. Thus, we offer a new understanding that feature normalization plays an important role in robustly preventing the dynamics collapse.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"2079-2124"},"PeriodicalIF":2.1000,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computation","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1162/neco.a.27","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Contrastive learning is a self-supervised representation learning framework where two positive views generated through data augmentation are made similar by an attraction force in a data representation space, while a repulsive force makes them far from negative examples. Noncontrastive learning, represented by BYOL and SimSiam, gets rid of negative examples and improves computational efficiency. While learned representations may collapse into a single point due to the lack of the repulsive force at first sight, Tian et al. (2021) revealed through learning dynamics analysis that the representations can avoid collapse if data augmentation is sufficiently stronger than regularization. However, their analysis does not take into account commonly used feature normalization, a normalizer before measuring the similarity of representations, and hence excessively strong regularization may still collapse the dynamics, an unnatural behavior under the presence of feature normalization. Therefore, we extend the previous theory based on the L2 loss by considering the cosine loss instead, which involves feature normalization. We show that the cosine loss induces sixth-order dynamics (while the L2 loss induces a third-order one), in which a stable equilibrium dynamically emerges even if there are only collapsed solutions with given initial parameters. Thus, we offer a new understanding that feature normalization plays an important role in robustly preventing the dynamics collapse.

查看原文本刊更多论文

特征归一化防止非对比学习动态的崩溃。

对比学习是一种自我监督的表征学习框架，通过数据增强生成的两个正面视图在数据表征空间中被引力使相似，而斥力使它们远离负面示例。以BYOL和SimSiam为代表的非对比学习也消除了负例，提高了计算效率。虽然学习到的表征可能会由于第一眼缺乏排斥力而崩溃为单点，但Tian等人（2021）通过学习动力学分析揭示，如果数据增强比正则化足够强，表征可以避免崩溃。然而，他们的分析并没有考虑到常用的特征归一化，在测量表征的相似性之前使用一个归一化器，因此过于强烈的正则化可能仍然会破坏动态，这是特征归一化存在下的一种不自然的行为。因此，我们通过考虑余弦损失来扩展先前基于L2损失的理论，这涉及到特征归一化。我们证明了余弦损失诱导六阶动力学（而L2损失诱导三阶动力学），其中即使只有具有给定初始参数的崩塌解，也会动态出现稳定的平衡。因此，我们提供了一个新的认识，即特征归一化在鲁棒性防止动态崩溃中起着重要作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Computation 工程技术-计算机：人工智能

CiteScore

6.30

自引率

3.40%

发文量

审稿时长

3.0 months

期刊介绍： Neural Computation is uniquely positioned at the crossroads between neuroscience and TMCS and welcomes the submission of original papers from all areas of TMCS, including: Advanced experimental design; Analysis of chemical sensor data; Connectomic reconstructions; Analysis of multielectrode and optical recordings; Genetic data for cell identity; Analysis of behavioral data; Multiscale models; Analysis of molecular mechanisms; Neuroinformatics; Analysis of brain imaging data; Neuromorphic engineering; Principles of neural coding, computation, circuit dynamics, and plasticity; Theories of brain function.