{"title":"神经网络的熵正则化:自相似近似","authors":"Amir R. Asadi, Po-Ling Loh","doi":"10.1016/j.jspi.2024.106181","DOIUrl":null,"url":null,"abstract":"<div><p>This paper focuses on entropic regularization and its multiscale extension in neural network learning. We leverage established results that characterize the optimizer of entropic regularization methods and their connection with generalization bounds. To avoid the significant computational complexity involved in sampling from the optimal multiscale Gibbs distributions, we describe how to make measured concessions in optimality by using self-similar approximating distributions. We study such scale-invariant approximations for linear neural networks and further extend the approximations to neural networks with nonlinear activation functions. We then illustrate the application of our proposed approach through empirical simulation. By navigating the interplay between optimization and computational efficiency, our research contributes to entropic regularization theory, proposing a practical method that embraces symmetry across scales.</p></div>","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0378375824000387/pdfft?md5=fcc1f48fea9b9d957df56a1c168f3f74&pid=1-s2.0-S0378375824000387-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Entropic regularization of neural networks: Self-similar approximations\",\"authors\":\"Amir R. Asadi, Po-Ling Loh\",\"doi\":\"10.1016/j.jspi.2024.106181\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>This paper focuses on entropic regularization and its multiscale extension in neural network learning. We leverage established results that characterize the optimizer of entropic regularization methods and their connection with generalization bounds. To avoid the significant computational complexity involved in sampling from the optimal multiscale Gibbs distributions, we describe how to make measured concessions in optimality by using self-similar approximating distributions. We study such scale-invariant approximations for linear neural networks and further extend the approximations to neural networks with nonlinear activation functions. We then illustrate the application of our proposed approach through empirical simulation. By navigating the interplay between optimization and computational efficiency, our research contributes to entropic regularization theory, proposing a practical method that embraces symmetry across scales.</p></div>\",\"PeriodicalId\":0,\"journal\":{\"name\":\"\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0,\"publicationDate\":\"2024-04-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0378375824000387/pdfft?md5=fcc1f48fea9b9d957df56a1c168f3f74&pid=1-s2.0-S0378375824000387-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0378375824000387\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378375824000387","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Entropic regularization of neural networks: Self-similar approximations
This paper focuses on entropic regularization and its multiscale extension in neural network learning. We leverage established results that characterize the optimizer of entropic regularization methods and their connection with generalization bounds. To avoid the significant computational complexity involved in sampling from the optimal multiscale Gibbs distributions, we describe how to make measured concessions in optimality by using self-similar approximating distributions. We study such scale-invariant approximations for linear neural networks and further extend the approximations to neural networks with nonlinear activation functions. We then illustrate the application of our proposed approach through empirical simulation. By navigating the interplay between optimization and computational efficiency, our research contributes to entropic regularization theory, proposing a practical method that embraces symmetry across scales.