考虑信号完整性的基于确定性策略梯度的DDR5内存信令结构优化强化学习

Daehwan Lho, Hyunwook Park, Keunwoo Kim, Seongguk Kim, Boogyo Sim, Kyungjune Son, Keeyoung Son, Jihun Kim, Seonguk Choi, Joonsang Park, Haeyeon Kim, Kyubong Kong, Joungho Kim
{"title":"考虑信号完整性的基于确定性策略梯度的DDR5内存信令结构优化强化学习","authors":"Daehwan Lho, Hyunwook Park, Keunwoo Kim, Seongguk Kim, Boogyo Sim, Kyungjune Son, Keeyoung Son, Jihun Kim, Seonguk Choi, Joonsang Park, Haeyeon Kim, Kyubong Kong, Joungho Kim","doi":"10.1109/EPEPS53828.2022.9947119","DOIUrl":null,"url":null,"abstract":"In this paper, we propose the deterministic policy gradient-based reinforcement learning for DDR5 memory signaling architecture optimization considering signal integrity. We convert the complex DDR5 memory signaling architecture optimization to the Markov decision process (MDP). The key limitation factor was found through the analysis of the hierarchical channel, and MDP was configured to solve it. The deterministic policy is essential for optimizing high-dimensional problems that have many continuous design parameters. For verification, we compare the proposed method with conventional methods such as random search (RS) and Bayesian optimization (BO) and other reinforcement learning algorithms such as the advantage actor-critic (A2C) and proximal policy optimization (PPO). RS and BO could not be properly optimized even after 10000 iterations of 1000 times, respectively, and A2C and PPO failed to optimize. As a result of comparison, the proposed method has the highest optimality, low computing time, and reusability.","PeriodicalId":284818,"journal":{"name":"2022 IEEE 31st Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Deterministic Policy Gradient-based Reinforcement Learning for DDR5 Memory Signaling Architecture Optimization considering Signal Integrity\",\"authors\":\"Daehwan Lho, Hyunwook Park, Keunwoo Kim, Seongguk Kim, Boogyo Sim, Kyungjune Son, Keeyoung Son, Jihun Kim, Seonguk Choi, Joonsang Park, Haeyeon Kim, Kyubong Kong, Joungho Kim\",\"doi\":\"10.1109/EPEPS53828.2022.9947119\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose the deterministic policy gradient-based reinforcement learning for DDR5 memory signaling architecture optimization considering signal integrity. We convert the complex DDR5 memory signaling architecture optimization to the Markov decision process (MDP). The key limitation factor was found through the analysis of the hierarchical channel, and MDP was configured to solve it. The deterministic policy is essential for optimizing high-dimensional problems that have many continuous design parameters. For verification, we compare the proposed method with conventional methods such as random search (RS) and Bayesian optimization (BO) and other reinforcement learning algorithms such as the advantage actor-critic (A2C) and proximal policy optimization (PPO). RS and BO could not be properly optimized even after 10000 iterations of 1000 times, respectively, and A2C and PPO failed to optimize. As a result of comparison, the proposed method has the highest optimality, low computing time, and reusability.\",\"PeriodicalId\":284818,\"journal\":{\"name\":\"2022 IEEE 31st Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 31st Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EPEPS53828.2022.9947119\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 31st Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EPEPS53828.2022.9947119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

在本文中,我们提出了基于确定性策略梯度的DDR5内存信令体系结构优化,并考虑了信号完整性。我们将复杂的DDR5内存信令架构优化转换为马尔可夫决策过程(MDP)。通过对分层通道的分析,找到了关键的限制因素,并通过配置MDP进行了解决。确定性策略对于具有许多连续设计参数的高维问题的优化是必不可少的。为了验证,我们将所提出的方法与传统方法(如随机搜索(RS)和贝叶斯优化(BO))以及其他强化学习算法(如优势行为者批评家(A2C)和近端策略优化(PPO))进行了比较。RS和BO分别经过1000次的10000次迭代也无法优化,A2C和PPO优化失败。结果表明,该方法具有最优性、计算时间短、可重用性好等优点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Deterministic Policy Gradient-based Reinforcement Learning for DDR5 Memory Signaling Architecture Optimization considering Signal Integrity
In this paper, we propose the deterministic policy gradient-based reinforcement learning for DDR5 memory signaling architecture optimization considering signal integrity. We convert the complex DDR5 memory signaling architecture optimization to the Markov decision process (MDP). The key limitation factor was found through the analysis of the hierarchical channel, and MDP was configured to solve it. The deterministic policy is essential for optimizing high-dimensional problems that have many continuous design parameters. For verification, we compare the proposed method with conventional methods such as random search (RS) and Bayesian optimization (BO) and other reinforcement learning algorithms such as the advantage actor-critic (A2C) and proximal policy optimization (PPO). RS and BO could not be properly optimized even after 10000 iterations of 1000 times, respectively, and A2C and PPO failed to optimize. As a result of comparison, the proposed method has the highest optimality, low computing time, and reusability.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信