{"title":"Learning rate range test for the vision transformer","authors":"Rinka Kiriyama, A. Sashima, I. Shimizu","doi":"10.1117/12.2692013","DOIUrl":null,"url":null,"abstract":"The solutions obtained by training the deep neural network are highly dependent on the parameters including the learning rate. Therefore, finding the appropriate settings for training deep neural networks is very important. In particular, it is necessary to find the better settings for SOTA models of Vision Transformer(ViT), whose structure is different from ordinal models. In this paper, we focus on the learning rate to find a better value using the Learning Rate Range Test (LRRT). Through our experiments, we found that the appropriate LR is located where the decrease in loss value stops in the LRRT. In addition, we discuss about the effects of the number of epochs and the LR warm up.","PeriodicalId":361127,"journal":{"name":"International Conference on Images, Signals, and Computing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Images, Signals, and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2692013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The solutions obtained by training the deep neural network are highly dependent on the parameters including the learning rate. Therefore, finding the appropriate settings for training deep neural networks is very important. In particular, it is necessary to find the better settings for SOTA models of Vision Transformer(ViT), whose structure is different from ordinal models. In this paper, we focus on the learning rate to find a better value using the Learning Rate Range Test (LRRT). Through our experiments, we found that the appropriate LR is located where the decrease in loss value stops in the LRRT. In addition, we discuss about the effects of the number of epochs and the LR warm up.