Model-Based OPC With Adaptive PID Control Through Reinforcement Learning

IF 2.3 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Semiconductor Manufacturing Pub Date : 2025-01-20 DOI:10.1109/TSM.2025.3528735

Taeyoung Kim;Shilong Zhang;Youngsoo Shin

{"title":"Model-Based OPC With Adaptive PID Control Through Reinforcement Learning","authors":"Taeyoung Kim;Shilong Zhang;Youngsoo Shin","doi":"10.1109/TSM.2025.3528735","DOIUrl":null,"url":null,"abstract":"Model-based optical proximity correction (MB- OPC) relies on a feedback loop, in which correction result, measured as edge placement error (EPE), is used for decision of next correction. A proportional-integral-derivative (PID) control is a popular mechanism employed for such feedback loop, but current MB-OPC usually relies only on P control. This is because there is no systematic way to customize P, I, and D coefficients for different layouts in different OPC iterations.We apply reinforcement learning (RL) to construct the trained actor that adaptively yields PID coefficients within the correction loop. The RL model consists of an actor and a critic. We perform supervised pre-training to quickly set the initial weights of RL model, with the actor mimicking standard MB-OPC. Subsequently, the critic is trained to predict accurate Q-value, the cumulative reward from OPC correction. The actor is then trained to maximize this Q-value. Experiments are performed with aggressive target maximum EPE values. The proposed OPC for test layouts requires 5 to 7 iterations, while standard MB-OPC (with constant coefficient-based control) completes in 20 to 28 iterations. This reduces OPC runtime to about 1/2.7 on average. In addition, maximum EPE is also reduced by about 24%.","PeriodicalId":451,"journal":{"name":"IEEE Transactions on Semiconductor Manufacturing","volume":"38 1","pages":"48-56"},"PeriodicalIF":2.3000,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Semiconductor Manufacturing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10847731/","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Model-based optical proximity correction (MB- OPC) relies on a feedback loop, in which correction result, measured as edge placement error (EPE), is used for decision of next correction. A proportional-integral-derivative (PID) control is a popular mechanism employed for such feedback loop, but current MB-OPC usually relies only on P control. This is because there is no systematic way to customize P, I, and D coefficients for different layouts in different OPC iterations.We apply reinforcement learning (RL) to construct the trained actor that adaptively yields PID coefficients within the correction loop. The RL model consists of an actor and a critic. We perform supervised pre-training to quickly set the initial weights of RL model, with the actor mimicking standard MB-OPC. Subsequently, the critic is trained to predict accurate Q-value, the cumulative reward from OPC correction. The actor is then trained to maximize this Q-value. Experiments are performed with aggressive target maximum EPE values. The proposed OPC for test layouts requires 5 to 7 iterations, while standard MB-OPC (with constant coefficient-based control) completes in 20 to 28 iterations. This reduces OPC runtime to about 1/2.7 on average. In addition, maximum EPE is also reduced by about 24%.

查看原文本刊更多论文

基于模型的OPC强化学习自适应PID控制

基于模型的光学邻近校正（MB- OPC）依赖于一个反馈回路，在这个反馈回路中，校正结果被测量为边缘放置误差（EPE），用于决定下一次校正。比例-积分-导数（PID）控制是这种反馈回路的常用机制，但目前的MB-OPC通常只依赖于P控制。这是因为在不同的OPC迭代中，没有系统的方法来定制不同布局的P、I和D系数。我们应用强化学习（RL）来构造训练好的actor，该actor在校正回路中自适应产生PID系数。RL模型由一个演员和一个评论家组成。我们执行监督预训练来快速设置RL模型的初始权值，参与者模仿标准MB-OPC。随后，训练评论家预测准确的q值，即OPC纠正的累积奖励。然后训练参与者最大化这个q值。实验采用侵略性目标最大EPE值进行。测试布局的建议OPC需要5到7次迭代，而标准MB-OPC（具有基于恒定系数的控制）在20到28次迭代中完成。这将OPC运行时间平均减少到1/2.7左右。此外，最大EPE也降低了约24%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Semiconductor Manufacturing 工程技术-工程：电子与电气

CiteScore

5.20

自引率

11.10%

发文量

101

审稿时长

3.3 months

期刊介绍： The IEEE Transactions on Semiconductor Manufacturing addresses the challenging problems of manufacturing complex microelectronic components, especially very large scale integrated circuits (VLSI). Manufacturing these products requires precision micropatterning, precise control of materials properties, ultraclean work environments, and complex interactions of chemical, physical, electrical and mechanical processes.