Daehwan Lho, Hyunwook Park, Keunwoo Kim, Seongguk Kim, Boogyo Sim, Kyungjune Son, Keeyoung Son, Jihun Kim, Seonguk Choi, Joonsang Park, Haeyeon Kim, Kyubong Kong, Joungho Kim
{"title":"考虑信号完整性的基于确定性策略梯度的DDR5内存信令结构优化强化学习","authors":"Daehwan Lho, Hyunwook Park, Keunwoo Kim, Seongguk Kim, Boogyo Sim, Kyungjune Son, Keeyoung Son, Jihun Kim, Seonguk Choi, Joonsang Park, Haeyeon Kim, Kyubong Kong, Joungho Kim","doi":"10.1109/EPEPS53828.2022.9947119","DOIUrl":null,"url":null,"abstract":"In this paper, we propose the deterministic policy gradient-based reinforcement learning for DDR5 memory signaling architecture optimization considering signal integrity. We convert the complex DDR5 memory signaling architecture optimization to the Markov decision process (MDP). The key limitation factor was found through the analysis of the hierarchical channel, and MDP was configured to solve it. The deterministic policy is essential for optimizing high-dimensional problems that have many continuous design parameters. For verification, we compare the proposed method with conventional methods such as random search (RS) and Bayesian optimization (BO) and other reinforcement learning algorithms such as the advantage actor-critic (A2C) and proximal policy optimization (PPO). RS and BO could not be properly optimized even after 10000 iterations of 1000 times, respectively, and A2C and PPO failed to optimize. As a result of comparison, the proposed method has the highest optimality, low computing time, and reusability.","PeriodicalId":284818,"journal":{"name":"2022 IEEE 31st Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Deterministic Policy Gradient-based Reinforcement Learning for DDR5 Memory Signaling Architecture Optimization considering Signal Integrity\",\"authors\":\"Daehwan Lho, Hyunwook Park, Keunwoo Kim, Seongguk Kim, Boogyo Sim, Kyungjune Son, Keeyoung Son, Jihun Kim, Seonguk Choi, Joonsang Park, Haeyeon Kim, Kyubong Kong, Joungho Kim\",\"doi\":\"10.1109/EPEPS53828.2022.9947119\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose the deterministic policy gradient-based reinforcement learning for DDR5 memory signaling architecture optimization considering signal integrity. We convert the complex DDR5 memory signaling architecture optimization to the Markov decision process (MDP). The key limitation factor was found through the analysis of the hierarchical channel, and MDP was configured to solve it. The deterministic policy is essential for optimizing high-dimensional problems that have many continuous design parameters. For verification, we compare the proposed method with conventional methods such as random search (RS) and Bayesian optimization (BO) and other reinforcement learning algorithms such as the advantage actor-critic (A2C) and proximal policy optimization (PPO). RS and BO could not be properly optimized even after 10000 iterations of 1000 times, respectively, and A2C and PPO failed to optimize. As a result of comparison, the proposed method has the highest optimality, low computing time, and reusability.\",\"PeriodicalId\":284818,\"journal\":{\"name\":\"2022 IEEE 31st Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 31st Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EPEPS53828.2022.9947119\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 31st Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EPEPS53828.2022.9947119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Deterministic Policy Gradient-based Reinforcement Learning for DDR5 Memory Signaling Architecture Optimization considering Signal Integrity
In this paper, we propose the deterministic policy gradient-based reinforcement learning for DDR5 memory signaling architecture optimization considering signal integrity. We convert the complex DDR5 memory signaling architecture optimization to the Markov decision process (MDP). The key limitation factor was found through the analysis of the hierarchical channel, and MDP was configured to solve it. The deterministic policy is essential for optimizing high-dimensional problems that have many continuous design parameters. For verification, we compare the proposed method with conventional methods such as random search (RS) and Bayesian optimization (BO) and other reinforcement learning algorithms such as the advantage actor-critic (A2C) and proximal policy optimization (PPO). RS and BO could not be properly optimized even after 10000 iterations of 1000 times, respectively, and A2C and PPO failed to optimize. As a result of comparison, the proposed method has the highest optimality, low computing time, and reusability.