学习和泛化单隐层神经网络，超越标准高斯数据

2022 56th Annual Conference on Information Sciences and Systems (CISS) Pub Date : 2022-03-09 DOI:10.1109/CISS53076.2022.9751184

Hongkang Li, Shuai Zhang, M. Wang

{"title":"学习和泛化单隐层神经网络，超越标准高斯数据","authors":"Hongkang Li, Shuai Zhang, M. Wang","doi":"10.1109/CISS53076.2022.9751184","DOIUrl":null,"url":null,"abstract":"This paper analyzes the convergence and generalization of training a one-hidden-layer neural network when the input features follow the Gaussian mixture model consisting of a finite number of Gaussian distributions. Assuming the labels are generated from a teacher model with an unknown ground truth weight, the learning problem is to estimate the underlying teacher model by minimizing a non-convex risk function over a student neural network. With a finite number of training samples, referred to the sample complexity, the iterations are proved to converge linearly to a critical point with guaranteed generalization error. In addition, for the first time, this paper characterizes the impact of the input distributions on the sample complexity and the learning rate.","PeriodicalId":305918,"journal":{"name":"2022 56th Annual Conference on Information Sciences and Systems (CISS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Learning and generalization of one-hidden-layer neural networks, going beyond standard Gaussian data\",\"authors\":\"Hongkang Li, Shuai Zhang, M. Wang\",\"doi\":\"10.1109/CISS53076.2022.9751184\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper analyzes the convergence and generalization of training a one-hidden-layer neural network when the input features follow the Gaussian mixture model consisting of a finite number of Gaussian distributions. Assuming the labels are generated from a teacher model with an unknown ground truth weight, the learning problem is to estimate the underlying teacher model by minimizing a non-convex risk function over a student neural network. With a finite number of training samples, referred to the sample complexity, the iterations are proved to converge linearly to a critical point with guaranteed generalization error. In addition, for the first time, this paper characterizes the impact of the input distributions on the sample complexity and the learning rate.\",\"PeriodicalId\":305918,\"journal\":{\"name\":\"2022 56th Annual Conference on Information Sciences and Systems (CISS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 56th Annual Conference on Information Sciences and Systems (CISS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CISS53076.2022.9751184\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 56th Annual Conference on Information Sciences and Systems (CISS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISS53076.2022.9751184","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

本文分析了当输入特征遵循由有限个高斯分布组成的高斯混合模型时，单隐层神经网络训练的收敛性和泛化性。假设标签是由一个具有未知基础真值权重的教师模型生成的，学习问题是通过最小化学生神经网络上的非凸风险函数来估计潜在的教师模型。在训练样本数量有限的情况下，参考样本复杂度，证明迭代线性收敛到一个保证泛化误差的临界点。此外，本文还首次刻画了输入分布对样本复杂度和学习率的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning and generalization of one-hidden-layer neural networks, going beyond standard Gaussian data

This paper analyzes the convergence and generalization of training a one-hidden-layer neural network when the input features follow the Gaussian mixture model consisting of a finite number of Gaussian distributions. Assuming the labels are generated from a teacher model with an unknown ground truth weight, the learning problem is to estimate the underlying teacher model by minimizing a non-convex risk function over a student neural network. With a finite number of training samples, referred to the sample complexity, the iterations are proved to converge linearly to a critical point with guaranteed generalization error. In addition, for the first time, this paper characterizes the impact of the input distributions on the sample complexity and the learning rate.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 56th Annual Conference on Information Sciences and Systems (CISS)

自引率

0.00%

发文量