利用正交梯度学习克服单次神经结构搜索中的多模型遗忘

IF 3.6 2区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computers Pub Date : 2025-02-11 DOI:10.1109/TC.2025.3540650

Lianbo Ma;Yuee Zhou;Ye Ma;Guo Yu;Qing Li;Qiang He;Yan Pei

{"title":"利用正交梯度学习克服单次神经结构搜索中的多模型遗忘","authors":"Lianbo Ma;Yuee Zhou;Ye Ma;Guo Yu;Qing Li;Qiang He;Yan Pei","doi":"10.1109/TC.2025.3540650","DOIUrl":null,"url":null,"abstract":"One-shot neural architecture search (NAS) trains an over-parameterized network (termed as supernet) that assembles all the architectures as its subnets by using weight sharing for computational budget reduction. However, there is an issue of multi-model forgetting during supernet training that some weights of the previously well-trained architecture will be overwritten by that of the newly sampled architecture which has overlapped structures with the old one. To overcome the issue, we propose an orthogonal gradient learning (OGL) guided supernet training paradigm, where the novelty lies in the fact that the weights of the overlapped structures of current architecture are updated in the orthogonal direction to the gradient space of these overlapped structures of all previously trained architectures. Moreover, a new approach of calculating the projection is designed to effectively find the base vectors of the gradient space to acquire the orthogonal direction. We have theoretically and experimentally proved the effectiveness of the proposed paradigm in overcoming the multi-model forgetting. Besides, we apply the proposed paradigm to two one-shot NAS baselines, and experimental results demonstrate that our approach is able to mitigate the multi-model forgetting and enhance the predictive ability of the supernet with remarkable efficiency on popular test datasets.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 5","pages":"1678-1689"},"PeriodicalIF":3.6000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Defying Multi-Model Forgetting in One-Shot Neural Architecture Search Using Orthogonal Gradient Learning\",\"authors\":\"Lianbo Ma;Yuee Zhou;Ye Ma;Guo Yu;Qing Li;Qiang He;Yan Pei\",\"doi\":\"10.1109/TC.2025.3540650\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One-shot neural architecture search (NAS) trains an over-parameterized network (termed as supernet) that assembles all the architectures as its subnets by using weight sharing for computational budget reduction. However, there is an issue of multi-model forgetting during supernet training that some weights of the previously well-trained architecture will be overwritten by that of the newly sampled architecture which has overlapped structures with the old one. To overcome the issue, we propose an orthogonal gradient learning (OGL) guided supernet training paradigm, where the novelty lies in the fact that the weights of the overlapped structures of current architecture are updated in the orthogonal direction to the gradient space of these overlapped structures of all previously trained architectures. Moreover, a new approach of calculating the projection is designed to effectively find the base vectors of the gradient space to acquire the orthogonal direction. We have theoretically and experimentally proved the effectiveness of the proposed paradigm in overcoming the multi-model forgetting. Besides, we apply the proposed paradigm to two one-shot NAS baselines, and experimental results demonstrate that our approach is able to mitigate the multi-model forgetting and enhance the predictive ability of the supernet with remarkable efficiency on popular test datasets.\",\"PeriodicalId\":13087,\"journal\":{\"name\":\"IEEE Transactions on Computers\",\"volume\":\"74 5\",\"pages\":\"1678-1689\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2025-02-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computers\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10880105/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10880105/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

单次神经结构搜索（One-shot neural architecture search， NAS）训练一个超参数化的网络（称为超级网络），该网络通过权值共享来减少计算预算，将所有的神经结构集合为其子网。然而，在超级网络训练过程中存在多模型遗忘的问题，即之前训练好的体系结构的一些权重会被新采样的体系结构的权重覆盖，而新采样的体系结构与旧的体系结构有重叠。为了克服这个问题，我们提出了一种正交梯度学习（OGL）引导的超级网络训练范式，其新颖之处在于当前架构的重叠结构的权重在正交方向上更新到所有先前训练的架构的这些重叠结构的梯度空间。此外，设计了一种新的投影计算方法，有效地求出梯度空间的基向量以获得正交方向。我们已经从理论上和实验上证明了所提出的范式在克服多模型遗忘方面的有效性。此外，我们将所提出的范式应用于两个单次NAS基线，实验结果表明，我们的方法能够减轻多模型遗忘，并在流行的测试数据集上显著提高超级网络的预测能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Defying Multi-Model Forgetting in One-Shot Neural Architecture Search Using Orthogonal Gradient Learning

One-shot neural architecture search (NAS) trains an over-parameterized network (termed as supernet) that assembles all the architectures as its subnets by using weight sharing for computational budget reduction. However, there is an issue of multi-model forgetting during supernet training that some weights of the previously well-trained architecture will be overwritten by that of the newly sampled architecture which has overlapped structures with the old one. To overcome the issue, we propose an orthogonal gradient learning (OGL) guided supernet training paradigm, where the novelty lies in the fact that the weights of the overlapped structures of current architecture are updated in the orthogonal direction to the gradient space of these overlapped structures of all previously trained architectures. Moreover, a new approach of calculating the projection is designed to effectively find the base vectors of the gradient space to acquire the orthogonal direction. We have theoretically and experimentally proved the effectiveness of the proposed paradigm in overcoming the multi-model forgetting. Besides, we apply the proposed paradigm to two one-shot NAS baselines, and experimental results demonstrate that our approach is able to mitigate the multi-model forgetting and enhance the predictive ability of the supernet with remarkable efficiency on popular test datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Computers 工程技术-工程：电子与电气

CiteScore

6.60

自引率

5.40%

发文量

199

审稿时长

6.0 months

期刊介绍： The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.