考虑信号完整性的基于确定性策略梯度的DDR5内存信令结构优化强化学习

2022 IEEE 31st Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS) Pub Date : 2022-10-09 DOI:10.1109/EPEPS53828.2022.9947119

Daehwan Lho, Hyunwook Park, Keunwoo Kim, Seongguk Kim, Boogyo Sim, Kyungjune Son, Keeyoung Son, Jihun Kim, Seonguk Choi, Joonsang Park, Haeyeon Kim, Kyubong Kong, Joungho Kim

{"title":"考虑信号完整性的基于确定性策略梯度的DDR5内存信令结构优化强化学习","authors":"Daehwan Lho, Hyunwook Park, Keunwoo Kim, Seongguk Kim, Boogyo Sim, Kyungjune Son, Keeyoung Son, Jihun Kim, Seonguk Choi, Joonsang Park, Haeyeon Kim, Kyubong Kong, Joungho Kim","doi":"10.1109/EPEPS53828.2022.9947119","DOIUrl":null,"url":null,"abstract":"In this paper, we propose the deterministic policy gradient-based reinforcement learning for DDR5 memory signaling architecture optimization considering signal integrity. We convert the complex DDR5 memory signaling architecture optimization to the Markov decision process (MDP). The key limitation factor was found through the analysis of the hierarchical channel, and MDP was configured to solve it. The deterministic policy is essential for optimizing high-dimensional problems that have many continuous design parameters. For verification, we compare the proposed method with conventional methods such as random search (RS) and Bayesian optimization (BO) and other reinforcement learning algorithms such as the advantage actor-critic (A2C) and proximal policy optimization (PPO). RS and BO could not be properly optimized even after 10000 iterations of 1000 times, respectively, and A2C and PPO failed to optimize. As a result of comparison, the proposed method has the highest optimality, low computing time, and reusability.","PeriodicalId":284818,"journal":{"name":"2022 IEEE 31st Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Deterministic Policy Gradient-based Reinforcement Learning for DDR5 Memory Signaling Architecture Optimization considering Signal Integrity\",\"authors\":\"Daehwan Lho, Hyunwook Park, Keunwoo Kim, Seongguk Kim, Boogyo Sim, Kyungjune Son, Keeyoung Son, Jihun Kim, Seonguk Choi, Joonsang Park, Haeyeon Kim, Kyubong Kong, Joungho Kim\",\"doi\":\"10.1109/EPEPS53828.2022.9947119\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose the deterministic policy gradient-based reinforcement learning for DDR5 memory signaling architecture optimization considering signal integrity. We convert the complex DDR5 memory signaling architecture optimization to the Markov decision process (MDP). The key limitation factor was found through the analysis of the hierarchical channel, and MDP was configured to solve it. The deterministic policy is essential for optimizing high-dimensional problems that have many continuous design parameters. For verification, we compare the proposed method with conventional methods such as random search (RS) and Bayesian optimization (BO) and other reinforcement learning algorithms such as the advantage actor-critic (A2C) and proximal policy optimization (PPO). RS and BO could not be properly optimized even after 10000 iterations of 1000 times, respectively, and A2C and PPO failed to optimize. As a result of comparison, the proposed method has the highest optimality, low computing time, and reusability.\",\"PeriodicalId\":284818,\"journal\":{\"name\":\"2022 IEEE 31st Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 31st Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EPEPS53828.2022.9947119\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 31st Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EPEPS53828.2022.9947119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在本文中，我们提出了基于确定性策略梯度的DDR5内存信令体系结构优化，并考虑了信号完整性。我们将复杂的DDR5内存信令架构优化转换为马尔可夫决策过程(MDP)。通过对分层通道的分析，找到了关键的限制因素，并通过配置MDP进行了解决。确定性策略对于具有许多连续设计参数的高维问题的优化是必不可少的。为了验证，我们将所提出的方法与传统方法(如随机搜索(RS)和贝叶斯优化(BO))以及其他强化学习算法(如优势行为者批评家(A2C)和近端策略优化(PPO))进行了比较。RS和BO分别经过1000次的10000次迭代也无法优化，A2C和PPO优化失败。结果表明，该方法具有最优性、计算时间短、可重用性好等优点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deterministic Policy Gradient-based Reinforcement Learning for DDR5 Memory Signaling Architecture Optimization considering Signal Integrity

In this paper, we propose the deterministic policy gradient-based reinforcement learning for DDR5 memory signaling architecture optimization considering signal integrity. We convert the complex DDR5 memory signaling architecture optimization to the Markov decision process (MDP). The key limitation factor was found through the analysis of the hierarchical channel, and MDP was configured to solve it. The deterministic policy is essential for optimizing high-dimensional problems that have many continuous design parameters. For verification, we compare the proposed method with conventional methods such as random search (RS) and Bayesian optimization (BO) and other reinforcement learning algorithms such as the advantage actor-critic (A2C) and proximal policy optimization (PPO). RS and BO could not be properly optimized even after 10000 iterations of 1000 times, respectively, and A2C and PPO failed to optimize. As a result of comparison, the proposed method has the highest optimality, low computing time, and reusability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 31st Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS)

自引率

0.00%

发文量