A deep reinforcement learning approach to quality prediction for production profitability optimization

Decision Analytics Journal Pub Date : 2025-09-30 DOI:10.1016/j.dajour.2025.100643

Jan Mayer , Lisa-Marie Wienbrandt , David Michels , Roland Jochem

{"title":"A deep reinforcement learning approach to quality prediction for production profitability optimization","authors":"Jan Mayer , Lisa-Marie Wienbrandt , David Michels , Roland Jochem","doi":"10.1016/j.dajour.2025.100643","DOIUrl":null,"url":null,"abstract":"<div><div>In high-variance manufacturing environments, the early detection and elimination of defective products is crucial for optimizing resource utilization and increasing profitability. Traditional quality control methods often fail to provide timely insights, especially in processes involving complex, multivariate sensor data. To address this gap, this study explores the use of Deep Reinforcement Learning, specifically Deep Q-Learning, as an early classifier for time series data. In order to facilitate the advantages of this approach, data obtained from semiconductor manufacturing, including multivariate sensor data and a final binary classification (good/bad product), is utilized. The proposed approach enables dynamic decision-making during the production process by modeling it as a Markov Decision Process and leveraging experience replay for stable learning. A novel, cost-sensitive reward function is introduced to account for class imbalance and to balance prediction earliness with accuracy. Five distinct models are optimized using hyperparameter tuning based on different classification metrics, and their performance is evaluated in terms of both predictive and economic outcomes. The model optimized with the F1-Metric achieves the best results, with an accuracy of 87% and a mean prediction time of just 1.26 process steps. Economically, this model results in a 22.4% reduction in production time and the highest profit gains across sensitivity analyses.</div></div>","PeriodicalId":100357,"journal":{"name":"Decision Analytics Journal","volume":"17 ","pages":"Article 100643"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Decision Analytics Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772662225000992","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In high-variance manufacturing environments, the early detection and elimination of defective products is crucial for optimizing resource utilization and increasing profitability. Traditional quality control methods often fail to provide timely insights, especially in processes involving complex, multivariate sensor data. To address this gap, this study explores the use of Deep Reinforcement Learning, specifically Deep Q-Learning, as an early classifier for time series data. In order to facilitate the advantages of this approach, data obtained from semiconductor manufacturing, including multivariate sensor data and a final binary classification (good/bad product), is utilized. The proposed approach enables dynamic decision-making during the production process by modeling it as a Markov Decision Process and leveraging experience replay for stable learning. A novel, cost-sensitive reward function is introduced to account for class imbalance and to balance prediction earliness with accuracy. Five distinct models are optimized using hyperparameter tuning based on different classification metrics, and their performance is evaluated in terms of both predictive and economic outcomes. The model optimized with the F1-Metric achieves the best results, with an accuracy of 87% and a mean prediction time of just 1.26 process steps. Economically, this model results in a 22.4% reduction in production time and the highest profit gains across sensitivity analyses.

查看原文本刊更多论文

面向生产盈利能力优化的深度强化学习质量预测方法

在高变化的制造环境中，早期发现和消除缺陷产品对于优化资源利用和提高盈利能力至关重要。传统的质量控制方法往往不能提供及时的见解，特别是在涉及复杂、多元传感器数据的过程中。为了解决这一差距，本研究探索了使用深度强化学习，特别是深度q -学习，作为时间序列数据的早期分类器。为了促进这种方法的优势，从半导体制造中获得的数据，包括多元传感器数据和最终的二元分类（好/坏产品），被利用。该方法通过将生产过程建模为马尔可夫决策过程并利用经验重放进行稳定学习，实现了生产过程中的动态决策。引入了一种新的成本敏感的奖励函数来解释类别不平衡，并平衡了预测的早期性和准确性。使用基于不同分类指标的超参数调优对五种不同的模型进行了优化，并根据预测结果和经济结果对它们的性能进行了评估。用F1-Metric优化的模型达到了最好的结果，准确率为87%，平均预测时间仅为1.26个过程步骤。从经济角度来看，该模型可以减少22.4%的生产时间，并在敏感性分析中获得最高的利润收益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Decision Analytics Journal

CiteScore

3.90

自引率

0.00%

发文量