深度学习统计理论概览：逼近、训练动态和生成模型

IF 8.7 1区数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Annual Review of Statistics and Its Application Pub Date : 2024-11-21 DOI:10.1146/annurev-statistics-040522-013920

Namjoon Suh, Guang Cheng

{"title":"深度学习统计理论概览：逼近、训练动态和生成模型","authors":"Namjoon Suh, Guang Cheng","doi":"10.1146/annurev-statistics-040522-013920","DOIUrl":null,"url":null,"abstract":"In this article, we review the literature on statistical theories of neural networks from three perspectives: approximation, training dynamics, and generative models. In the first part, results on excess risks for neural networks are reviewed in the nonparametric framework of regression. These results rely on explicit constructions of neural networks, leading to fast convergence rates of excess risks. Nonetheless, their underlying analysis only applies to the global minimizer in the highly nonconvex landscape of deep neural networks. This motivates us to review the training dynamics of neural networks in the second part. Specifically, we review articles that attempt to answer the question of how a neural network trained via gradient-based methods finds a solution that can generalize well on unseen data. In particular, two well-known paradigms are reviewed: the neural tangent kernel and mean-field paradigms. Last, we review the most recent theoretical advancements in generative models, including generative adversarial networks, diffusion models, and in-context learning in large language models from two of the same perspectives, approximation and training dynamics.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"111 1","pages":""},"PeriodicalIF":8.7000,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models\",\"authors\":\"Namjoon Suh, Guang Cheng\",\"doi\":\"10.1146/annurev-statistics-040522-013920\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this article, we review the literature on statistical theories of neural networks from three perspectives: approximation, training dynamics, and generative models. In the first part, results on excess risks for neural networks are reviewed in the nonparametric framework of regression. These results rely on explicit constructions of neural networks, leading to fast convergence rates of excess risks. Nonetheless, their underlying analysis only applies to the global minimizer in the highly nonconvex landscape of deep neural networks. This motivates us to review the training dynamics of neural networks in the second part. Specifically, we review articles that attempt to answer the question of how a neural network trained via gradient-based methods finds a solution that can generalize well on unseen data. In particular, two well-known paradigms are reviewed: the neural tangent kernel and mean-field paradigms. Last, we review the most recent theoretical advancements in generative models, including generative adversarial networks, diffusion models, and in-context learning in large language models from two of the same perspectives, approximation and training dynamics.\",\"PeriodicalId\":48855,\"journal\":{\"name\":\"Annual Review of Statistics and Its Application\",\"volume\":\"111 1\",\"pages\":\"\"},\"PeriodicalIF\":8.7000,\"publicationDate\":\"2024-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annual Review of Statistics and Its Application\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1146/annurev-statistics-040522-013920\",\"RegionNum\":1,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Review of Statistics and Its Application","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1146/annurev-statistics-040522-013920","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

在本文中，我们从逼近、训练动态和生成模型三个角度回顾了有关神经网络统计理论的文献。在第一部分中，我们回顾了在非参数回归框架下神经网络的超额风险结果。这些结果依赖于神经网络的明确构造，从而导致超额风险的快速收敛率。然而，它们的基本分析只适用于深度神经网络高度非凸景观中的全局最小化。这促使我们在第二部分回顾神经网络的训练动态。具体来说，我们回顾了一些文章，这些文章试图回答这样一个问题：通过基于梯度的方法训练的神经网络如何找到一个能在未见数据上很好泛化的解决方案。我们特别回顾了两种著名的范式：神经正切核和均值场范式。最后，我们回顾了生成模型的最新理论进展，包括生成对抗网络、扩散模型，以及从近似和训练动态这两个相同的角度对大型语言模型进行的上下文学习。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models

In this article, we review the literature on statistical theories of neural networks from three perspectives: approximation, training dynamics, and generative models. In the first part, results on excess risks for neural networks are reviewed in the nonparametric framework of regression. These results rely on explicit constructions of neural networks, leading to fast convergence rates of excess risks. Nonetheless, their underlying analysis only applies to the global minimizer in the highly nonconvex landscape of deep neural networks. This motivates us to review the training dynamics of neural networks in the second part. Specifically, we review articles that attempt to answer the question of how a neural network trained via gradient-based methods finds a solution that can generalize well on unseen data. In particular, two well-known paradigms are reviewed: the neural tangent kernel and mean-field paradigms. Last, we review the most recent theoretical advancements in generative models, including generative adversarial networks, diffusion models, and in-context learning in large language models from two of the same perspectives, approximation and training dynamics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Annual Review of Statistics and Its Application MATHEMATICS, INTERDISCIPLINARY APPLICATIONS-STATISTICS & PROBABILITY

CiteScore

13.40

自引率

1.30%

发文量

期刊介绍： The Annual Review of Statistics and Its Application publishes comprehensive review articles focusing on methodological advancements in statistics and the utilization of computational tools facilitating these advancements. It is abstracted and indexed in Scopus, Science Citation Index Expanded, and Inspec.