Taeyoung Kim, Xin Huang, Hai-Bao Chen, V. Sukharev, S. Tan
{"title":"Learning-based dynamic reliability management for dark silicon processor considering EM effects","authors":"Taeyoung Kim, Xin Huang, Hai-Bao Chen, V. Sukharev, S. Tan","doi":"10.3850/9783981537079_0441","DOIUrl":null,"url":null,"abstract":"In this article, we propose a new dynamic reliability management (DRM) technique for emerging dark silicon manycore processors. We formulate our DRM problem as minimizing the energy consumption subject to the reliability, performance and thermal constraints. The new approach is based on a newly proposed physics-based electromigration (EM) reliability model to predict the EM reliability of full-chip power grid networks. We consider thermal design power (TDP) as the power constraint for a dark silicon manycore processor. We employ both dynamic voltage and frequency scaling (DVFS) and dark silicon core using ON/OFF pulsing action as the two control knobs. To solve the problem, we apply the adaptive Q-learning based method, which is suitable for runtime operation as it can provide cost-effective yet good solutions. A large class of multithreaded applications is used as the benchmark to validate and compare the proposed dynamic reliability management methods. Experimental results on a 64-core dark silicon chip show that the proposed DRM algorithm can effectively reduce the energy consumption of a dark silicon manycore system when the system is not tightly constrained. The proposed method can outperform a simple global DVFS method significantly in this case.","PeriodicalId":311352,"journal":{"name":"2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3850/9783981537079_0441","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20
Abstract
In this article, we propose a new dynamic reliability management (DRM) technique for emerging dark silicon manycore processors. We formulate our DRM problem as minimizing the energy consumption subject to the reliability, performance and thermal constraints. The new approach is based on a newly proposed physics-based electromigration (EM) reliability model to predict the EM reliability of full-chip power grid networks. We consider thermal design power (TDP) as the power constraint for a dark silicon manycore processor. We employ both dynamic voltage and frequency scaling (DVFS) and dark silicon core using ON/OFF pulsing action as the two control knobs. To solve the problem, we apply the adaptive Q-learning based method, which is suitable for runtime operation as it can provide cost-effective yet good solutions. A large class of multithreaded applications is used as the benchmark to validate and compare the proposed dynamic reliability management methods. Experimental results on a 64-core dark silicon chip show that the proposed DRM algorithm can effectively reduce the energy consumption of a dark silicon manycore system when the system is not tightly constrained. The proposed method can outperform a simple global DVFS method significantly in this case.