Effective Deep Reinforcement Learning for Dynamic Machine Allocation: A Case Study on Metal Sputtering Tools

IF 2.3 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Semiconductor Manufacturing Pub Date : 2025-06-16 DOI:10.1109/TSM.2025.3579970

Hsin-Tzu Hsu;Shi-Chung Chang

{"title":"Effective Deep Reinforcement Learning for Dynamic Machine Allocation: A Case Study on Metal Sputtering Tools","authors":"Hsin-Tzu Hsu;Shi-Chung Chang","doi":"10.1109/TSM.2025.3579970","DOIUrl":null,"url":null,"abstract":"Dynamic Machine Allocation (DMA) is a vital aspect of production scheduling in semiconductor manufacturing. Current DMA practices heavily rely on engineers’ domain expertise and require a few days of manual adjustments in response to rapid but significant fab changes, for example, due to unfamiliar economic shifts. Slow and heuristic DMA policy adaptation very often leads to production shortfalls. To reduce dependence on human expertise and speed up quality responses to changes, we design a framework of effective deep reinforcement learning (DRL) for DMA. Design innovations of the framework include (1) a discrete-event simulator for predicting production flows among machines with state, DMA action and reward aligned to fab practices; (2) a DRL neural network output transformation module that ensures action feasibility in task compatibility and machine availability; and (3) a DRL-based, two-stage agent of DMA policy learning that integrates DRL with optimization techniques for both efficient computation and quality DMA. Operation simulation by using the DMA case and data of a metal sputtering machine group demonstrates that our DRL-based design effectively learns DMA policies in different scenarios, each within one hour. In throughput performance, learned policies surpass a traditional heuristic by 3% to 20%. Our framework and the DRL-based method designs are generic and applicable to DMA of various machine groups.","PeriodicalId":451,"journal":{"name":"IEEE Transactions on Semiconductor Manufacturing","volume":"38 3","pages":"430-438"},"PeriodicalIF":2.3000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Semiconductor Manufacturing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11037289/","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Dynamic Machine Allocation (DMA) is a vital aspect of production scheduling in semiconductor manufacturing. Current DMA practices heavily rely on engineers’ domain expertise and require a few days of manual adjustments in response to rapid but significant fab changes, for example, due to unfamiliar economic shifts. Slow and heuristic DMA policy adaptation very often leads to production shortfalls. To reduce dependence on human expertise and speed up quality responses to changes, we design a framework of effective deep reinforcement learning (DRL) for DMA. Design innovations of the framework include (1) a discrete-event simulator for predicting production flows among machines with state, DMA action and reward aligned to fab practices; (2) a DRL neural network output transformation module that ensures action feasibility in task compatibility and machine availability; and (3) a DRL-based, two-stage agent of DMA policy learning that integrates DRL with optimization techniques for both efficient computation and quality DMA. Operation simulation by using the DMA case and data of a metal sputtering machine group demonstrates that our DRL-based design effectively learns DMA policies in different scenarios, each within one hour. In throughput performance, learned policies surpass a traditional heuristic by 3% to 20%. Our framework and the DRL-based method designs are generic and applicable to DMA of various machine groups.

查看原文本刊更多论文

有效的深度强化学习在机器动态分配中的应用——以金属溅射工具为例

动态机器分配（DMA）是半导体制造业生产调度的一个重要方面。当前的DMA实践严重依赖工程师的专业知识，并且需要几天的人工调整来响应快速但重大的晶圆厂变化，例如，由于不熟悉的经济变化。缓慢和启发式的DMA策略适应经常导致生产不足。为了减少对人类专业知识的依赖并加快对变化的质量响应，我们为DMA设计了一个有效的深度强化学习（DRL）框架。该框架的设计创新包括(1)一个离散事件模拟器，用于预测机器之间的生产流程，其状态、DMA动作和奖励与晶圆厂实践相一致；(2) DRL神经网络输出转换模块，保证动作在任务兼容性和机器可用性方面的可行性；(3)基于DRL的两阶段DMA策略学习代理，将DRL与优化技术相结合，实现高效计算和高质量的DMA。通过DMA案例和金属溅射机组数据的运行仿真表明，基于drl的设计可以在1小时内有效地学习到不同场景下的DMA策略。在吞吐量性能方面，学习策略比传统的启发式策略高出3%到20%。我们的框架和基于drl的方法设计是通用的，适用于各种机器组的DMA。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Semiconductor Manufacturing 工程技术-工程：电子与电气

CiteScore

5.20

自引率

11.10%

发文量

101

审稿时长

3.3 months

期刊介绍： The IEEE Transactions on Semiconductor Manufacturing addresses the challenging problems of manufacturing complex microelectronic components, especially very large scale integrated circuits (VLSI). Manufacturing these products requires precision micropatterning, precise control of materials properties, ultraclean work environments, and complex interactions of chemical, physical, electrical and mechanical processes.