Online sequential decision making of multi-stage assembly process parameters based on deep reinforcement learning and its application in diesel engine production

IF 14.2 1区工程技术 Q1 ENGINEERING, INDUSTRIAL

Journal of Manufacturing Systems Pub Date : 2025-09-02 DOI:10.1016/j.jmsy.2025.08.012

Yi-Tian Song , Yan-Ning Sun , Li-Lan Liu , Jie Wu , Zeng-Gui Gao , Wei Qin

{"title":"Online sequential decision making of multi-stage assembly process parameters based on deep reinforcement learning and its application in diesel engine production","authors":"Yi-Tian Song , Yan-Ning Sun , Li-Lan Liu , Jie Wu , Zeng-Gui Gao , Wei Qin","doi":"10.1016/j.jmsy.2025.08.012","DOIUrl":null,"url":null,"abstract":"<div><div>Maintaining fixed parameters during batch assembly of complex mechanical products often results in quality inconsistencies due to time-varying operational conditions, including equipment performance degradation, production environment disturbance, and operator skill variations. This operational reality necessitates online parameter adaptation mechanisms to counteract progressive quality deviations. While complex assemblies inherently involve sequential multi-stage workflows across distributed stations, conventional optimization strategies often employ monolithic parameter adjustments that neglect error propagation effects and inter-stage quality interdependencies. To address the dual challenges of dynamic operating conditions and multi-stage coordination, this study proposes an online sequential decision-making framework based on deep reinforcement learning. First, a causal inference model for assembly quality prognosis is constructed by integrating the greedy equivalence search algorithm with domain-specific expert knowledge, enabling systematic modeling of multi-stage quality dependencies. Subsequently, the multi-stage parameters optimization problem is formalized as a Markov decision process, with innovatively defined state space as assembly progress, action space as adjusted parameters range, and physics-informed reward function derived from quality inference results. Building on this, the proximal policy optimization algorithm is improved by stage-aware experience replay and gradient alignment constraints to learn the optimal policy, and then select the optimal action. Experiments on a real-world diesel engine assembly dataset demonstrate a 17.16 % improvement in product qualification probability, significantly outperforming conventional methods. The proposed framework effectively captures time-varying assembly characteristics and achieves cross-stage parameter coordination through sequential decision-making, offering a novel data-driven solution for quality control in complex product assembly systems.</div></div>","PeriodicalId":16227,"journal":{"name":"Journal of Manufacturing Systems","volume":"82 ","pages":"Pages 1252-1268"},"PeriodicalIF":14.2000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Manufacturing Systems","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0278612525002110","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}

引用次数: 0

Abstract

Maintaining fixed parameters during batch assembly of complex mechanical products often results in quality inconsistencies due to time-varying operational conditions, including equipment performance degradation, production environment disturbance, and operator skill variations. This operational reality necessitates online parameter adaptation mechanisms to counteract progressive quality deviations. While complex assemblies inherently involve sequential multi-stage workflows across distributed stations, conventional optimization strategies often employ monolithic parameter adjustments that neglect error propagation effects and inter-stage quality interdependencies. To address the dual challenges of dynamic operating conditions and multi-stage coordination, this study proposes an online sequential decision-making framework based on deep reinforcement learning. First, a causal inference model for assembly quality prognosis is constructed by integrating the greedy equivalence search algorithm with domain-specific expert knowledge, enabling systematic modeling of multi-stage quality dependencies. Subsequently, the multi-stage parameters optimization problem is formalized as a Markov decision process, with innovatively defined state space as assembly progress, action space as adjusted parameters range, and physics-informed reward function derived from quality inference results. Building on this, the proximal policy optimization algorithm is improved by stage-aware experience replay and gradient alignment constraints to learn the optimal policy, and then select the optimal action. Experiments on a real-world diesel engine assembly dataset demonstrate a 17.16 % improvement in product qualification probability, significantly outperforming conventional methods. The proposed framework effectively captures time-varying assembly characteristics and achieves cross-stage parameter coordination through sequential decision-making, offering a novel data-driven solution for quality control in complex product assembly systems.

查看原文本刊更多论文

基于深度强化学习的多阶段装配工艺参数在线顺序决策及其在柴油机生产中的应用

在复杂机械产品的批量装配过程中，由于操作条件的时变，包括设备性能下降、生产环境干扰和操作人员技能的变化，保持固定的参数通常会导致质量不一致。这种操作现实需要在线参数适应机制来抵消渐进式质量偏差。虽然复杂的装配本质上涉及跨分布式工作站的连续多阶段工作流程，但传统的优化策略通常采用单一参数调整，忽略了误差传播效应和阶段间质量的相互依赖性。为了解决动态运行条件和多阶段协调的双重挑战，本研究提出了一种基于深度强化学习的在线顺序决策框架。首先，将贪婪等价搜索算法与特定领域的专家知识相结合，构建了装配质量预测的因果推理模型，实现了多阶段质量依赖关系的系统化建模；随后，将多阶段参数优化问题形式化为马尔可夫决策过程，创新地将状态空间定义为装配进度，将动作空间定义为调整后的参数范围，并根据质量推断结果推导出物理通知的奖励函数。在此基础上，通过阶段感知经验重放和梯度对齐约束对近端策略优化算法进行改进，学习最优策略，进而选择最优动作。在真实柴油机装配数据集上的实验表明，产品合格率提高了17.16 %，显著优于传统方法。该框架有效捕获时变装配特征，并通过序列决策实现跨阶段参数协调，为复杂产品装配系统的质量控制提供了一种新的数据驱动解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Manufacturing Systems 工程技术-工程：工业

CiteScore

23.30

自引率

13.20%

发文量

216

审稿时长

25 days

期刊介绍： The Journal of Manufacturing Systems is dedicated to showcasing cutting-edge fundamental and applied research in manufacturing at the systems level. Encompassing products, equipment, people, information, control, and support functions, manufacturing systems play a pivotal role in the economical and competitive development, production, delivery, and total lifecycle of products, meeting market and societal needs. With a commitment to publishing archival scholarly literature, the journal strives to advance the state of the art in manufacturing systems and foster innovation in crafting efficient, robust, and sustainable manufacturing systems. The focus extends from equipment-level considerations to the broader scope of the extended enterprise. The Journal welcomes research addressing challenges across various scales, including nano, micro, and macro-scale manufacturing, and spanning diverse sectors such as aerospace, automotive, energy, and medical device manufacturing.