{"title":"Generalization Performance of Empirical Risk Minimization on Over-Parameterized Deep ReLU Nets","authors":"Shao-Bo Lin;Yao Wang;Ding-Xuan Zhou","doi":"10.1109/TIT.2025.3531048","DOIUrl":null,"url":null,"abstract":"In this paper, we study the generalization performance of global minima of empirical risk minimization (ERM) on over-parameterized deep ReLU nets. Using a novel deepening scheme for deep ReLU nets, we rigorously prove that there exist perfect global minima achieving optimal generalization error rates for numerous types of data under mild conditions. Since over-parameterization of deep ReLU nets is crucial to guarantee that the global minima of ERM can be realized by the widely used stochastic gradient descent (SGD) algorithm, our results present a potential way to fill the gap between optimization and generalization of deep learning.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 3","pages":"1978-1993"},"PeriodicalIF":2.2000,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10844907","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Theory","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10844907/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we study the generalization performance of global minima of empirical risk minimization (ERM) on over-parameterized deep ReLU nets. Using a novel deepening scheme for deep ReLU nets, we rigorously prove that there exist perfect global minima achieving optimal generalization error rates for numerous types of data under mild conditions. Since over-parameterization of deep ReLU nets is crucial to guarantee that the global minima of ERM can be realized by the widely used stochastic gradient descent (SGD) algorithm, our results present a potential way to fill the gap between optimization and generalization of deep learning.
期刊介绍:
The IEEE Transactions on Information Theory is a journal that publishes theoretical and experimental papers concerned with the transmission, processing, and utilization of information. The boundaries of acceptable subject matter are intentionally not sharply delimited. Rather, it is hoped that as the focus of research activity changes, a flexible policy will permit this Transactions to follow suit. Current appropriate topics are best reflected by recent Tables of Contents; they are summarized in the titles of editorial areas that appear on the inside front cover.