Low-dimensional intrinsic dimension reveals a phase transition in gradient-based learning of deep neural networks

IF 2.7 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Machine Learning and Cybernetics Pub Date : 2024-07-04 DOI:10.1007/s13042-024-02244-x

Chengli Tan, Jiangshe Zhang, Junmin Liu, Zixiang Zhao

{"title":"Low-dimensional intrinsic dimension reveals a phase transition in gradient-based learning of deep neural networks","authors":"Chengli Tan, Jiangshe Zhang, Junmin Liu, Zixiang Zhao","doi":"10.1007/s13042-024-02244-x","DOIUrl":null,"url":null,"abstract":"<p>Deep neural networks complete a feature extraction task by propagating the inputs through multiple modules. However, how the representations evolve with the gradient-based optimization remains unknown. Here we leverage the intrinsic dimension of the representations to study the learning dynamics and find that the training process undergoes a phase transition from expansion to compression under disparate training regimes. Surprisingly, this phenomenon is ubiquitous across a wide variety of model architectures, optimizers, and data sets. We demonstrate that the variation in the intrinsic dimension is consistent with the complexity of the learned hypothesis, which can be quantitatively assessed by the critical sample ratio that is rooted in adversarial robustness. Meanwhile, we mathematically show that this phenomenon can be analyzed in terms of the mutable correlation between neurons. Although the evoked activities obey a power-law decaying rule in biological circuits, we identify that the power-law exponent of the representations in deep neural networks predicted adversarial robustness well only at the end of the training but not during the training process. These results together suggest that deep neural networks are prone to producing robust representations by adaptively eliminating or retaining redundancies. The code is publicly available at https://github.com/cltan023/learning2022.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"48 1","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Machine Learning and Cybernetics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s13042-024-02244-x","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Deep neural networks complete a feature extraction task by propagating the inputs through multiple modules. However, how the representations evolve with the gradient-based optimization remains unknown. Here we leverage the intrinsic dimension of the representations to study the learning dynamics and find that the training process undergoes a phase transition from expansion to compression under disparate training regimes. Surprisingly, this phenomenon is ubiquitous across a wide variety of model architectures, optimizers, and data sets. We demonstrate that the variation in the intrinsic dimension is consistent with the complexity of the learned hypothesis, which can be quantitatively assessed by the critical sample ratio that is rooted in adversarial robustness. Meanwhile, we mathematically show that this phenomenon can be analyzed in terms of the mutable correlation between neurons. Although the evoked activities obey a power-law decaying rule in biological circuits, we identify that the power-law exponent of the representations in deep neural networks predicted adversarial robustness well only at the end of the training but not during the training process. These results together suggest that deep neural networks are prone to producing robust representations by adaptively eliminating or retaining redundancies. The code is publicly available at https://github.com/cltan023/learning2022.

Abstract Image

查看原文本刊更多论文

低维内在维度揭示了基于梯度学习的深度神经网络的阶段性转变

深度神经网络通过多个模块传播输入来完成特征提取任务。然而，表征如何随着基于梯度的优化而演化仍是未知数。在这里，我们利用表征的内在维度来研究学习动态，并发现在不同的训练机制下，训练过程经历了从扩展到压缩的阶段性转变。令人惊讶的是，这种现象在各种模型架构、优化器和数据集中都普遍存在。我们证明了内在维度的变化与所学假设的复杂性是一致的，这可以通过临界样本比进行定量评估，而临界样本比则植根于对抗鲁棒性。同时，我们用数学方法证明，这种现象可以用神经元之间可变的相关性来分析。虽然诱发活动在生物回路中遵循幂律衰减规则，但我们发现，深度神经网络中表征的幂律指数只有在训练结束时才能很好地预测对抗鲁棒性，而在训练过程中却不能。这些结果共同表明，深度神经网络很容易通过自适应地消除或保留冗余来产生鲁棒性表征。代码可在 https://github.com/cltan023/learning2022 公开获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Machine Learning and Cybernetics COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

7.90

自引率

10.70%

发文量

225

期刊介绍： Cybernetics is concerned with describing complex interactions and interrelationships between systems which are omnipresent in our daily life. Machine Learning discovers fundamental functional relationships between variables and ensembles of variables in systems. The merging of the disciplines of Machine Learning and Cybernetics is aimed at the discovery of various forms of interaction between systems through diverse mechanisms of learning from data. The International Journal of Machine Learning and Cybernetics (IJMLC) focuses on the key research problems emerging at the junction of machine learning and cybernetics and serves as a broad forum for rapid dissemination of the latest advancements in the area. The emphasis of IJMLC is on the hybrid development of machine learning and cybernetics schemes inspired by different contributing disciplines such as engineering, mathematics, cognitive sciences, and applications. New ideas, design alternatives, implementations and case studies pertaining to all the aspects of machine learning and cybernetics fall within the scope of the IJMLC. Key research areas to be covered by the journal include: Machine Learning for modeling interactions between systems Pattern Recognition technology to support discovery of system-environment interaction Control of system-environment interactions Biochemical interaction in biological and biologically-inspired systems Learning for improvement of communication schemes between systems