A deep reinforcement learning approach to quality prediction for production profitability optimization

Jan Mayer , Lisa-Marie Wienbrandt , David Michels , Roland Jochem
{"title":"A deep reinforcement learning approach to quality prediction for production profitability optimization","authors":"Jan Mayer ,&nbsp;Lisa-Marie Wienbrandt ,&nbsp;David Michels ,&nbsp;Roland Jochem","doi":"10.1016/j.dajour.2025.100643","DOIUrl":null,"url":null,"abstract":"<div><div>In high-variance manufacturing environments, the early detection and elimination of defective products is crucial for optimizing resource utilization and increasing profitability. Traditional quality control methods often fail to provide timely insights, especially in processes involving complex, multivariate sensor data. To address this gap, this study explores the use of Deep Reinforcement Learning, specifically Deep Q-Learning, as an early classifier for time series data. In order to facilitate the advantages of this approach, data obtained from semiconductor manufacturing, including multivariate sensor data and a final binary classification (good/bad product), is utilized. The proposed approach enables dynamic decision-making during the production process by modeling it as a Markov Decision Process and leveraging experience replay for stable learning. A novel, cost-sensitive reward function is introduced to account for class imbalance and to balance prediction earliness with accuracy. Five distinct models are optimized using hyperparameter tuning based on different classification metrics, and their performance is evaluated in terms of both predictive and economic outcomes. The model optimized with the F1-Metric achieves the best results, with an accuracy of 87% and a mean prediction time of just 1.26 process steps. Economically, this model results in a 22.4% reduction in production time and the highest profit gains across sensitivity analyses.</div></div>","PeriodicalId":100357,"journal":{"name":"Decision Analytics Journal","volume":"17 ","pages":"Article 100643"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Decision Analytics Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772662225000992","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In high-variance manufacturing environments, the early detection and elimination of defective products is crucial for optimizing resource utilization and increasing profitability. Traditional quality control methods often fail to provide timely insights, especially in processes involving complex, multivariate sensor data. To address this gap, this study explores the use of Deep Reinforcement Learning, specifically Deep Q-Learning, as an early classifier for time series data. In order to facilitate the advantages of this approach, data obtained from semiconductor manufacturing, including multivariate sensor data and a final binary classification (good/bad product), is utilized. The proposed approach enables dynamic decision-making during the production process by modeling it as a Markov Decision Process and leveraging experience replay for stable learning. A novel, cost-sensitive reward function is introduced to account for class imbalance and to balance prediction earliness with accuracy. Five distinct models are optimized using hyperparameter tuning based on different classification metrics, and their performance is evaluated in terms of both predictive and economic outcomes. The model optimized with the F1-Metric achieves the best results, with an accuracy of 87% and a mean prediction time of just 1.26 process steps. Economically, this model results in a 22.4% reduction in production time and the highest profit gains across sensitivity analyses.
面向生产盈利能力优化的深度强化学习质量预测方法
在高变化的制造环境中,早期发现和消除缺陷产品对于优化资源利用和提高盈利能力至关重要。传统的质量控制方法往往不能提供及时的见解,特别是在涉及复杂、多元传感器数据的过程中。为了解决这一差距,本研究探索了使用深度强化学习,特别是深度q -学习,作为时间序列数据的早期分类器。为了促进这种方法的优势,从半导体制造中获得的数据,包括多元传感器数据和最终的二元分类(好/坏产品),被利用。该方法通过将生产过程建模为马尔可夫决策过程并利用经验重放进行稳定学习,实现了生产过程中的动态决策。引入了一种新的成本敏感的奖励函数来解释类别不平衡,并平衡了预测的早期性和准确性。使用基于不同分类指标的超参数调优对五种不同的模型进行了优化,并根据预测结果和经济结果对它们的性能进行了评估。用F1-Metric优化的模型达到了最好的结果,准确率为87%,平均预测时间仅为1.26个过程步骤。从经济角度来看,该模型可以减少22.4%的生产时间,并在敏感性分析中获得最高的利润收益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
3.90
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信