ReLU网络的可训练性与数据依赖初始化。

arXiv: Learning Pub Date : 2019-07-23 DOI:10.1615/.2020034126

Yeonjong Shin, G. Karniadakis

{"title":"ReLU网络的可训练性与数据依赖初始化。","authors":"Yeonjong Shin, G. Karniadakis","doi":"10.1615/.2020034126","DOIUrl":null,"url":null,"abstract":"In this paper, we study the trainability of rectified linear unit (ReLU) networks. A ReLU neuron is said to be dead if it only outputs a constant for any input. Two death states of neurons are introduced; tentative and permanent death. A network is then said to be trainable if the number of permanently dead neurons is sufficiently small for a learning task. We refer to the probability of a network being trainable as trainability. We show that a network being trainable is a necessary condition for successful training and the trainability serves as an upper bound of successful training rates. In order to quantify the trainability, we study the probability distribution of the number of active neurons at the initialization. In many applications, over-specified or over-parameterized neural networks are successfully employed and shown to be trained effectively. With the notion of trainability, we show that over-parameterization is both a necessary and a sufficient condition for minimizing the training loss. Furthermore, we propose a data-dependent initialization method in the over-parameterized setting. Numerical examples are provided to demonstrate the effectiveness of the method and our theoretical findings.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"28 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Trainability of ReLU networks and Data-dependent Initialization.\",\"authors\":\"Yeonjong Shin, G. Karniadakis\",\"doi\":\"10.1615/.2020034126\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we study the trainability of rectified linear unit (ReLU) networks. A ReLU neuron is said to be dead if it only outputs a constant for any input. Two death states of neurons are introduced; tentative and permanent death. A network is then said to be trainable if the number of permanently dead neurons is sufficiently small for a learning task. We refer to the probability of a network being trainable as trainability. We show that a network being trainable is a necessary condition for successful training and the trainability serves as an upper bound of successful training rates. In order to quantify the trainability, we study the probability distribution of the number of active neurons at the initialization. In many applications, over-specified or over-parameterized neural networks are successfully employed and shown to be trained effectively. With the notion of trainability, we show that over-parameterization is both a necessary and a sufficient condition for minimizing the training loss. Furthermore, we propose a data-dependent initialization method in the over-parameterized setting. Numerical examples are provided to demonstrate the effectiveness of the method and our theoretical findings.\",\"PeriodicalId\":8468,\"journal\":{\"name\":\"arXiv: Learning\",\"volume\":\"28 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv: Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1615/.2020034126\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv: Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1615/.2020034126","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

本文研究了整流线性单元(ReLU)网络的可训练性问题。如果一个ReLU神经元对任何输入只输出一个常数，那么它就被认为是死的。介绍了神经元的两种死亡状态;暂时和永久的死亡。如果永久死亡神经元的数量对于学习任务来说足够小，那么我们就说这个网络是可训练的。我们把网络可训练的概率称为可训练性。我们证明了网络的可训练性是训练成功的必要条件，可训练性是训练成功率的上界。为了量化可训练性，我们研究了初始化时活动神经元数量的概率分布。在许多应用中，过度指定或过度参数化的神经网络被成功地应用并被证明是有效的训练。利用可训练性的概念，我们证明了过度参数化是最小化训练损失的充分必要条件。此外，我们提出了一种基于数据的过参数化初始化方法。数值算例验证了该方法的有效性和理论结论。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Trainability of ReLU networks and Data-dependent Initialization.

In this paper, we study the trainability of rectified linear unit (ReLU) networks. A ReLU neuron is said to be dead if it only outputs a constant for any input. Two death states of neurons are introduced; tentative and permanent death. A network is then said to be trainable if the number of permanently dead neurons is sufficiently small for a learning task. We refer to the probability of a network being trainable as trainability. We show that a network being trainable is a necessary condition for successful training and the trainability serves as an upper bound of successful training rates. In order to quantify the trainability, we study the probability distribution of the number of active neurons at the initialization. In many applications, over-specified or over-parameterized neural networks are successfully employed and shown to be trained effectively. With the notion of trainability, we show that over-parameterization is both a necessary and a sufficient condition for minimizing the training loss. Furthermore, we propose a data-dependent initialization method in the over-parameterized setting. Numerical examples are provided to demonstrate the effectiveness of the method and our theoretical findings.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv: Learning

自引率

0.00%

发文量