{"title":"Defying Multi-Model Forgetting in One-Shot Neural Architecture Search Using Orthogonal Gradient Learning","authors":"Lianbo Ma;Yuee Zhou;Ye Ma;Guo Yu;Qing Li;Qiang He;Yan Pei","doi":"10.1109/TC.2025.3540650","DOIUrl":null,"url":null,"abstract":"One-shot neural architecture search (NAS) trains an over-parameterized network (termed as supernet) that assembles all the architectures as its subnets by using weight sharing for computational budget reduction. However, there is an issue of multi-model forgetting during supernet training that some weights of the previously well-trained architecture will be overwritten by that of the newly sampled architecture which has overlapped structures with the old one. To overcome the issue, we propose an orthogonal gradient learning (OGL) guided supernet training paradigm, where the novelty lies in the fact that the weights of the overlapped structures of current architecture are updated in the orthogonal direction to the gradient space of these overlapped structures of all previously trained architectures. Moreover, a new approach of calculating the projection is designed to effectively find the base vectors of the gradient space to acquire the orthogonal direction. We have theoretically and experimentally proved the effectiveness of the proposed paradigm in overcoming the multi-model forgetting. Besides, we apply the proposed paradigm to two one-shot NAS baselines, and experimental results demonstrate that our approach is able to mitigate the multi-model forgetting and enhance the predictive ability of the supernet with remarkable efficiency on popular test datasets.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 5","pages":"1678-1689"},"PeriodicalIF":3.6000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10880105/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
One-shot neural architecture search (NAS) trains an over-parameterized network (termed as supernet) that assembles all the architectures as its subnets by using weight sharing for computational budget reduction. However, there is an issue of multi-model forgetting during supernet training that some weights of the previously well-trained architecture will be overwritten by that of the newly sampled architecture which has overlapped structures with the old one. To overcome the issue, we propose an orthogonal gradient learning (OGL) guided supernet training paradigm, where the novelty lies in the fact that the weights of the overlapped structures of current architecture are updated in the orthogonal direction to the gradient space of these overlapped structures of all previously trained architectures. Moreover, a new approach of calculating the projection is designed to effectively find the base vectors of the gradient space to acquire the orthogonal direction. We have theoretically and experimentally proved the effectiveness of the proposed paradigm in overcoming the multi-model forgetting. Besides, we apply the proposed paradigm to two one-shot NAS baselines, and experimental results demonstrate that our approach is able to mitigate the multi-model forgetting and enhance the predictive ability of the supernet with remarkable efficiency on popular test datasets.
期刊介绍:
The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.