Khoi Do;Minh-Duong Nguyen;Nguyen Tien Hoa;Long Tran-Thanh;Nguyen H. Tran;Quoc-Viet Pham
{"title":"Revisiting LARS for Large Batch Training Generalization of Neural Networks","authors":"Khoi Do;Minh-Duong Nguyen;Nguyen Tien Hoa;Long Tran-Thanh;Nguyen H. Tran;Quoc-Viet Pham","doi":"10.1109/TAI.2024.3523252","DOIUrl":null,"url":null,"abstract":"This article investigates large batch training techniques using layer-wise adaptive scaling ratio (LARS) across diverse settings. In particular, we first show that a state-of-the-art technique, called LARS with the warm-up, tends to be trapped in sharp minimizers early on due to redundant ratio scaling. Additionally, a fixed steep decline in the latter phase restricts deep neural networks from effectively navigating early-phase sharp minimizers. To address these issues, we propose time varying LARS (TVLARS), a novel algorithm that replaces warm-up with a configurable sigmoid-like function for robust training in the initial phase. TVLARS promotes gradient exploration early on, surpassing sharp optimizers and gradually transitioning to LARS for robustness in later stages. Extensive experiments demonstrate that TVLARS consistently outperforms LARS and LAMB in most cases, with up to 2% improvement in classification scenarios. In all self-supervised learning cases, TVLARS achieves up to 10% performance improvement. Our implementation is available at <uri>https://github.com/KhoiDOO/tvlars</uri>.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 5","pages":"1321-1333"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10817779","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10817779/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This article investigates large batch training techniques using layer-wise adaptive scaling ratio (LARS) across diverse settings. In particular, we first show that a state-of-the-art technique, called LARS with the warm-up, tends to be trapped in sharp minimizers early on due to redundant ratio scaling. Additionally, a fixed steep decline in the latter phase restricts deep neural networks from effectively navigating early-phase sharp minimizers. To address these issues, we propose time varying LARS (TVLARS), a novel algorithm that replaces warm-up with a configurable sigmoid-like function for robust training in the initial phase. TVLARS promotes gradient exploration early on, surpassing sharp optimizers and gradually transitioning to LARS for robustness in later stages. Extensive experiments demonstrate that TVLARS consistently outperforms LARS and LAMB in most cases, with up to 2% improvement in classification scenarios. In all self-supervised learning cases, TVLARS achieves up to 10% performance improvement. Our implementation is available at https://github.com/KhoiDOO/tvlars.