{"title":"过度参数化的深度神经网络使经验风险最小化,但泛化效果不佳","authors":"M. Kohler, A. Krzyżak","doi":"10.3150/21-BEJ1323","DOIUrl":null,"url":null,"abstract":"Recently it was shown in several papers that backpropagation is able to find the global minimum of the empirical risk on the training data using over-parametrized deep neural networks. In this paper, a similar result is shown for deep neural networks with the sigmoidal squasher activation function in a regression setting, and a lower bound is presented which proves that these networks do not generalize well on a new data in the sense that networks which minimize the empirical risk do not achieve the optimal minimax rate of convergence for estimation of smooth regression functions.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":"27 1","pages":"2564-2597"},"PeriodicalIF":1.5000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Over-parametrized deep neural networks minimizing the empirical risk do not generalize well\",\"authors\":\"M. Kohler, A. Krzyżak\",\"doi\":\"10.3150/21-BEJ1323\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently it was shown in several papers that backpropagation is able to find the global minimum of the empirical risk on the training data using over-parametrized deep neural networks. In this paper, a similar result is shown for deep neural networks with the sigmoidal squasher activation function in a regression setting, and a lower bound is presented which proves that these networks do not generalize well on a new data in the sense that networks which minimize the empirical risk do not achieve the optimal minimax rate of convergence for estimation of smooth regression functions.\",\"PeriodicalId\":55387,\"journal\":{\"name\":\"Bernoulli\",\"volume\":\"27 1\",\"pages\":\"2564-2597\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bernoulli\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.3150/21-BEJ1323\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bernoulli","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.3150/21-BEJ1323","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
Over-parametrized deep neural networks minimizing the empirical risk do not generalize well
Recently it was shown in several papers that backpropagation is able to find the global minimum of the empirical risk on the training data using over-parametrized deep neural networks. In this paper, a similar result is shown for deep neural networks with the sigmoidal squasher activation function in a regression setting, and a lower bound is presented which proves that these networks do not generalize well on a new data in the sense that networks which minimize the empirical risk do not achieve the optimal minimax rate of convergence for estimation of smooth regression functions.
期刊介绍:
BERNOULLI is the journal of the Bernoulli Society for Mathematical Statistics and Probability, issued four times per year. The journal provides a comprehensive account of important developments in the fields of statistics and probability, offering an international forum for both theoretical and applied work.
BERNOULLI will publish:
Papers containing original and significant research contributions: with background, mathematical derivation and discussion of the results in suitable detail and, where appropriate, with discussion of interesting applications in relation to the methodology proposed.
Papers of the following two types will also be considered for publication, provided they are judged to enhance the dissemination of research:
Review papers which provide an integrated critical survey of some area of probability and statistics and discuss important recent developments.
Scholarly written papers on some historical significant aspect of statistics and probability.