{"title":"混沌的渐近边作为神经网络训练的指导原则","authors":"Lin Zhang, Ling Feng, Kan Chen, Choy Heng Lai","doi":"10.1142/s2972335323500011","DOIUrl":null,"url":null,"abstract":"It has been recently demonstrated that optimal neural networks operate near the asymptotic edge of chaos for state-of-the-art feed-forward neural networks, where its generalization power is maximal due to the highest number of asymptotic metastable states. However, how to leverage this principle to improve the model training process remains open. Here, by mapping the model evolution during training to the phase diagram in the classic analytic result of Sherrington–Kirkpatrick model in spin glasses, we illustrate on a simple neural network model that one can provide principled training of the network without manually tuning the training hyper-parameters. In particular, we provide a semi-analytical method to set the optimal weight decay strength, such that the model will converge toward the edge of chaos during training. Consequently, such hyper-parameter setting leads the model to achieve the highest test accuracy. Another benefit for restricting the model at the edge of chaos is its robustness against the common practical problem of label noise, as we find that it automatically avoids fitting the shuffled labels in the training samples while maintaining good fitting to the correct labels, providing simple means of achieving good performance on noisy labels without any additional treatment.","PeriodicalId":68167,"journal":{"name":"人工智能与机器人研究","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Asymptotic Edge of Chaos as Guiding Principle for Neural Network Training\",\"authors\":\"Lin Zhang, Ling Feng, Kan Chen, Choy Heng Lai\",\"doi\":\"10.1142/s2972335323500011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It has been recently demonstrated that optimal neural networks operate near the asymptotic edge of chaos for state-of-the-art feed-forward neural networks, where its generalization power is maximal due to the highest number of asymptotic metastable states. However, how to leverage this principle to improve the model training process remains open. Here, by mapping the model evolution during training to the phase diagram in the classic analytic result of Sherrington–Kirkpatrick model in spin glasses, we illustrate on a simple neural network model that one can provide principled training of the network without manually tuning the training hyper-parameters. In particular, we provide a semi-analytical method to set the optimal weight decay strength, such that the model will converge toward the edge of chaos during training. Consequently, such hyper-parameter setting leads the model to achieve the highest test accuracy. Another benefit for restricting the model at the edge of chaos is its robustness against the common practical problem of label noise, as we find that it automatically avoids fitting the shuffled labels in the training samples while maintaining good fitting to the correct labels, providing simple means of achieving good performance on noisy labels without any additional treatment.\",\"PeriodicalId\":68167,\"journal\":{\"name\":\"人工智能与机器人研究\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"人工智能与机器人研究\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/s2972335323500011\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"人工智能与机器人研究","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s2972335323500011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Asymptotic Edge of Chaos as Guiding Principle for Neural Network Training
It has been recently demonstrated that optimal neural networks operate near the asymptotic edge of chaos for state-of-the-art feed-forward neural networks, where its generalization power is maximal due to the highest number of asymptotic metastable states. However, how to leverage this principle to improve the model training process remains open. Here, by mapping the model evolution during training to the phase diagram in the classic analytic result of Sherrington–Kirkpatrick model in spin glasses, we illustrate on a simple neural network model that one can provide principled training of the network without manually tuning the training hyper-parameters. In particular, we provide a semi-analytical method to set the optimal weight decay strength, such that the model will converge toward the edge of chaos during training. Consequently, such hyper-parameter setting leads the model to achieve the highest test accuracy. Another benefit for restricting the model at the edge of chaos is its robustness against the common practical problem of label noise, as we find that it automatically avoids fitting the shuffled labels in the training samples while maintaining good fitting to the correct labels, providing simple means of achieving good performance on noisy labels without any additional treatment.