Yi-Tian Song , Yan-Ning Sun , Li-Lan Liu , Jie Wu , Zeng-Gui Gao , Wei Qin
{"title":"基于深度强化学习的多阶段装配工艺参数在线顺序决策及其在柴油机生产中的应用","authors":"Yi-Tian Song , Yan-Ning Sun , Li-Lan Liu , Jie Wu , Zeng-Gui Gao , Wei Qin","doi":"10.1016/j.jmsy.2025.08.012","DOIUrl":null,"url":null,"abstract":"<div><div>Maintaining fixed parameters during batch assembly of complex mechanical products often results in quality inconsistencies due to time-varying operational conditions, including equipment performance degradation, production environment disturbance, and operator skill variations. This operational reality necessitates online parameter adaptation mechanisms to counteract progressive quality deviations. While complex assemblies inherently involve sequential multi-stage workflows across distributed stations, conventional optimization strategies often employ monolithic parameter adjustments that neglect error propagation effects and inter-stage quality interdependencies. To address the dual challenges of dynamic operating conditions and multi-stage coordination, this study proposes an online sequential decision-making framework based on deep reinforcement learning. First, a causal inference model for assembly quality prognosis is constructed by integrating the greedy equivalence search algorithm with domain-specific expert knowledge, enabling systematic modeling of multi-stage quality dependencies. Subsequently, the multi-stage parameters optimization problem is formalized as a Markov decision process, with innovatively defined state space as assembly progress, action space as adjusted parameters range, and physics-informed reward function derived from quality inference results. Building on this, the proximal policy optimization algorithm is improved by stage-aware experience replay and gradient alignment constraints to learn the optimal policy, and then select the optimal action. Experiments on a real-world diesel engine assembly dataset demonstrate a 17.16 % improvement in product qualification probability, significantly outperforming conventional methods. The proposed framework effectively captures time-varying assembly characteristics and achieves cross-stage parameter coordination through sequential decision-making, offering a novel data-driven solution for quality control in complex product assembly systems.</div></div>","PeriodicalId":16227,"journal":{"name":"Journal of Manufacturing Systems","volume":"82 ","pages":"Pages 1252-1268"},"PeriodicalIF":14.2000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Online sequential decision making of multi-stage assembly process parameters based on deep reinforcement learning and its application in diesel engine production\",\"authors\":\"Yi-Tian Song , Yan-Ning Sun , Li-Lan Liu , Jie Wu , Zeng-Gui Gao , Wei Qin\",\"doi\":\"10.1016/j.jmsy.2025.08.012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Maintaining fixed parameters during batch assembly of complex mechanical products often results in quality inconsistencies due to time-varying operational conditions, including equipment performance degradation, production environment disturbance, and operator skill variations. This operational reality necessitates online parameter adaptation mechanisms to counteract progressive quality deviations. While complex assemblies inherently involve sequential multi-stage workflows across distributed stations, conventional optimization strategies often employ monolithic parameter adjustments that neglect error propagation effects and inter-stage quality interdependencies. To address the dual challenges of dynamic operating conditions and multi-stage coordination, this study proposes an online sequential decision-making framework based on deep reinforcement learning. First, a causal inference model for assembly quality prognosis is constructed by integrating the greedy equivalence search algorithm with domain-specific expert knowledge, enabling systematic modeling of multi-stage quality dependencies. Subsequently, the multi-stage parameters optimization problem is formalized as a Markov decision process, with innovatively defined state space as assembly progress, action space as adjusted parameters range, and physics-informed reward function derived from quality inference results. Building on this, the proximal policy optimization algorithm is improved by stage-aware experience replay and gradient alignment constraints to learn the optimal policy, and then select the optimal action. Experiments on a real-world diesel engine assembly dataset demonstrate a 17.16 % improvement in product qualification probability, significantly outperforming conventional methods. The proposed framework effectively captures time-varying assembly characteristics and achieves cross-stage parameter coordination through sequential decision-making, offering a novel data-driven solution for quality control in complex product assembly systems.</div></div>\",\"PeriodicalId\":16227,\"journal\":{\"name\":\"Journal of Manufacturing Systems\",\"volume\":\"82 \",\"pages\":\"Pages 1252-1268\"},\"PeriodicalIF\":14.2000,\"publicationDate\":\"2025-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Manufacturing Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0278612525002110\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, INDUSTRIAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Manufacturing Systems","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0278612525002110","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}
Online sequential decision making of multi-stage assembly process parameters based on deep reinforcement learning and its application in diesel engine production
Maintaining fixed parameters during batch assembly of complex mechanical products often results in quality inconsistencies due to time-varying operational conditions, including equipment performance degradation, production environment disturbance, and operator skill variations. This operational reality necessitates online parameter adaptation mechanisms to counteract progressive quality deviations. While complex assemblies inherently involve sequential multi-stage workflows across distributed stations, conventional optimization strategies often employ monolithic parameter adjustments that neglect error propagation effects and inter-stage quality interdependencies. To address the dual challenges of dynamic operating conditions and multi-stage coordination, this study proposes an online sequential decision-making framework based on deep reinforcement learning. First, a causal inference model for assembly quality prognosis is constructed by integrating the greedy equivalence search algorithm with domain-specific expert knowledge, enabling systematic modeling of multi-stage quality dependencies. Subsequently, the multi-stage parameters optimization problem is formalized as a Markov decision process, with innovatively defined state space as assembly progress, action space as adjusted parameters range, and physics-informed reward function derived from quality inference results. Building on this, the proximal policy optimization algorithm is improved by stage-aware experience replay and gradient alignment constraints to learn the optimal policy, and then select the optimal action. Experiments on a real-world diesel engine assembly dataset demonstrate a 17.16 % improvement in product qualification probability, significantly outperforming conventional methods. The proposed framework effectively captures time-varying assembly characteristics and achieves cross-stage parameter coordination through sequential decision-making, offering a novel data-driven solution for quality control in complex product assembly systems.
期刊介绍:
The Journal of Manufacturing Systems is dedicated to showcasing cutting-edge fundamental and applied research in manufacturing at the systems level. Encompassing products, equipment, people, information, control, and support functions, manufacturing systems play a pivotal role in the economical and competitive development, production, delivery, and total lifecycle of products, meeting market and societal needs.
With a commitment to publishing archival scholarly literature, the journal strives to advance the state of the art in manufacturing systems and foster innovation in crafting efficient, robust, and sustainable manufacturing systems. The focus extends from equipment-level considerations to the broader scope of the extended enterprise. The Journal welcomes research addressing challenges across various scales, including nano, micro, and macro-scale manufacturing, and spanning diverse sectors such as aerospace, automotive, energy, and medical device manufacturing.