Jan Mayer , Lisa-Marie Wienbrandt , David Michels , Roland Jochem
{"title":"面向生产盈利能力优化的深度强化学习质量预测方法","authors":"Jan Mayer , Lisa-Marie Wienbrandt , David Michels , Roland Jochem","doi":"10.1016/j.dajour.2025.100643","DOIUrl":null,"url":null,"abstract":"<div><div>In high-variance manufacturing environments, the early detection and elimination of defective products is crucial for optimizing resource utilization and increasing profitability. Traditional quality control methods often fail to provide timely insights, especially in processes involving complex, multivariate sensor data. To address this gap, this study explores the use of Deep Reinforcement Learning, specifically Deep Q-Learning, as an early classifier for time series data. In order to facilitate the advantages of this approach, data obtained from semiconductor manufacturing, including multivariate sensor data and a final binary classification (good/bad product), is utilized. The proposed approach enables dynamic decision-making during the production process by modeling it as a Markov Decision Process and leveraging experience replay for stable learning. A novel, cost-sensitive reward function is introduced to account for class imbalance and to balance prediction earliness with accuracy. Five distinct models are optimized using hyperparameter tuning based on different classification metrics, and their performance is evaluated in terms of both predictive and economic outcomes. The model optimized with the F1-Metric achieves the best results, with an accuracy of 87% and a mean prediction time of just 1.26 process steps. Economically, this model results in a 22.4% reduction in production time and the highest profit gains across sensitivity analyses.</div></div>","PeriodicalId":100357,"journal":{"name":"Decision Analytics Journal","volume":"17 ","pages":"Article 100643"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A deep reinforcement learning approach to quality prediction for production profitability optimization\",\"authors\":\"Jan Mayer , Lisa-Marie Wienbrandt , David Michels , Roland Jochem\",\"doi\":\"10.1016/j.dajour.2025.100643\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In high-variance manufacturing environments, the early detection and elimination of defective products is crucial for optimizing resource utilization and increasing profitability. Traditional quality control methods often fail to provide timely insights, especially in processes involving complex, multivariate sensor data. To address this gap, this study explores the use of Deep Reinforcement Learning, specifically Deep Q-Learning, as an early classifier for time series data. In order to facilitate the advantages of this approach, data obtained from semiconductor manufacturing, including multivariate sensor data and a final binary classification (good/bad product), is utilized. The proposed approach enables dynamic decision-making during the production process by modeling it as a Markov Decision Process and leveraging experience replay for stable learning. A novel, cost-sensitive reward function is introduced to account for class imbalance and to balance prediction earliness with accuracy. Five distinct models are optimized using hyperparameter tuning based on different classification metrics, and their performance is evaluated in terms of both predictive and economic outcomes. The model optimized with the F1-Metric achieves the best results, with an accuracy of 87% and a mean prediction time of just 1.26 process steps. Economically, this model results in a 22.4% reduction in production time and the highest profit gains across sensitivity analyses.</div></div>\",\"PeriodicalId\":100357,\"journal\":{\"name\":\"Decision Analytics Journal\",\"volume\":\"17 \",\"pages\":\"Article 100643\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Decision Analytics Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772662225000992\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Decision Analytics Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772662225000992","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A deep reinforcement learning approach to quality prediction for production profitability optimization
In high-variance manufacturing environments, the early detection and elimination of defective products is crucial for optimizing resource utilization and increasing profitability. Traditional quality control methods often fail to provide timely insights, especially in processes involving complex, multivariate sensor data. To address this gap, this study explores the use of Deep Reinforcement Learning, specifically Deep Q-Learning, as an early classifier for time series data. In order to facilitate the advantages of this approach, data obtained from semiconductor manufacturing, including multivariate sensor data and a final binary classification (good/bad product), is utilized. The proposed approach enables dynamic decision-making during the production process by modeling it as a Markov Decision Process and leveraging experience replay for stable learning. A novel, cost-sensitive reward function is introduced to account for class imbalance and to balance prediction earliness with accuracy. Five distinct models are optimized using hyperparameter tuning based on different classification metrics, and their performance is evaluated in terms of both predictive and economic outcomes. The model optimized with the F1-Metric achieves the best results, with an accuracy of 87% and a mean prediction time of just 1.26 process steps. Economically, this model results in a 22.4% reduction in production time and the highest profit gains across sensitivity analyses.