An adaptive and stability-promoting layerwise training approach for sparse deep neural network architecture

IF 6.9 1区工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY

Computer Methods in Applied Mechanics and Engineering Pub Date : 2025-04-01 DOI:10.1016/j.cma.2025.117938

C.G. Krishnanunni , Tan Bui-Thanh

{"title":"An adaptive and stability-promoting layerwise training approach for sparse deep neural network architecture","authors":"C.G. Krishnanunni , Tan Bui-Thanh","doi":"10.1016/j.cma.2025.117938","DOIUrl":null,"url":null,"abstract":"<div><div>This work presents a two-stage adaptive framework for progressively developing deep neural network (DNN) architectures that generalize well for a given training data set. In the first stage, a layerwise training approach is adopted where a new layer is added each time and trained independently by freezing parameters in the previous layers. We impose desirable structures on the DNN by employing manifold regularization, sparsity regularization, and physics-informed terms. We introduce a <span><math><mrow><mi>ɛ</mi><mo>−</mo><mi>δ</mi><mo>−</mo></mrow></math></span> stability-promoting concept as a desirable property for a learning algorithm and show that employing manifold regularization yields a <span><math><mrow><mi>ɛ</mi><mo>−</mo><mi>δ</mi></mrow></math></span> stability-promoting algorithm. Further, we also derive the necessary conditions for the trainability of a newly added layer and investigate the training saturation problem. In the second stage of the algorithm (post-processing), a sequence of shallow networks is employed to extract information from the residual produced in the first stage, thereby improving the prediction accuracy. Numerical investigations on prototype regression and classification problems demonstrate that the proposed approach can outperform fully connected DNNs of the same size. Moreover, by equipping the physics-informed neural network (PINN) with the proposed adaptive architecture strategy to solve partial differential equations, we numerically show that adaptive PINNs not only are superior to standard PINNs but also produce interpretable hidden layers with provable stability. We also apply our architecture design strategy to solve inverse problems governed by elliptic partial differential equations.</div></div>","PeriodicalId":55222,"journal":{"name":"Computer Methods in Applied Mechanics and Engineering","volume":"441 ","pages":"Article 117938"},"PeriodicalIF":6.9000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Methods in Applied Mechanics and Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045782525002105","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

This work presents a two-stage adaptive framework for progressively developing deep neural network (DNN) architectures that generalize well for a given training data set. In the first stage, a layerwise training approach is adopted where a new layer is added each time and trained independently by freezing parameters in the previous layers. We impose desirable structures on the DNN by employing manifold regularization, sparsity regularization, and physics-informed terms. We introduce a

ɛ - δ -

stability-promoting concept as a desirable property for a learning algorithm and show that employing manifold regularization yields a

ɛ - δ

stability-promoting algorithm. Further, we also derive the necessary conditions for the trainability of a newly added layer and investigate the training saturation problem. In the second stage of the algorithm (post-processing), a sequence of shallow networks is employed to extract information from the residual produced in the first stage, thereby improving the prediction accuracy. Numerical investigations on prototype regression and classification problems demonstrate that the proposed approach can outperform fully connected DNNs of the same size. Moreover, by equipping the physics-informed neural network (PINN) with the proposed adaptive architecture strategy to solve partial differential equations, we numerically show that adaptive PINNs not only are superior to standard PINNs but also produce interpretable hidden layers with provable stability. We also apply our architecture design strategy to solve inverse problems governed by elliptic partial differential equations.

查看原文本刊更多论文

稀疏深度神经网络结构的一种自适应和增强稳定性的分层训练方法

这项工作提出了一个两阶段的自适应框架，用于逐步开发深度神经网络（DNN）架构，该架构可以很好地泛化给定的训练数据集。第一阶段采用逐层训练的方法，每次增加一层，通过冻结前一层的参数独立训练。我们通过使用流形正则化、稀疏正则化和物理信息术语在DNN上施加理想的结构。我们引入了一个促进稳定的概念作为学习算法的理想性质，并证明了采用流形正则化可以产生一个促进稳定的算法。此外，我们还推导了新添加层可训练性的必要条件，并研究了训练饱和问题。在算法的第二阶段（后处理），利用一系列浅层网络从第一阶段产生的残差中提取信息，从而提高预测精度。对原型回归和分类问题的数值研究表明，该方法优于相同大小的全连接深度神经网络。此外，通过为物理信息神经网络（PINN）配备所提出的自适应结构策略来求解偏微分方程，我们在数值上表明自适应PINN不仅优于标准PINN，而且还产生具有可证明稳定性的可解释隐藏层。我们也应用我们的架构设计策略来解决由椭圆型偏微分方程控制的逆问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Methods in Applied Mechanics and Engineering 工程技术-工程：综合

CiteScore

12.70

自引率

15.30%

发文量

719

审稿时长

44 days

期刊介绍： Computer Methods in Applied Mechanics and Engineering stands as a cornerstone in the realm of computational science and engineering. With a history spanning over five decades, the journal has been a key platform for disseminating papers on advanced mathematical modeling and numerical solutions. Interdisciplinary in nature, these contributions encompass mechanics, mathematics, computer science, and various scientific disciplines. The journal welcomes a broad range of computational methods addressing the simulation, analysis, and design of complex physical problems, making it a vital resource for researchers in the field.