M. Salehi, M. Shafique, F. Kriebel, Semeen Rehman, Mohammad Khavari Tavana, A. Ejlali, J. Henkel
{"title":"dsReliM: Power-constrained reliability management in Dark-Silicon many-core chips under process variations","authors":"M. Salehi, M. Shafique, F. Kriebel, Semeen Rehman, Mohammad Khavari Tavana, A. Ejlali, J. Henkel","doi":"10.1109/CODESISSS.2015.7331370","DOIUrl":null,"url":null,"abstract":"Due to the tight power envelope, in the future technology nodes it is envisaged that not all cores in a many-core chip can be simultaneously powered-on (at full performance level). The power-gated cores are referred to as Dark Silicon. At the same time, growing reliability issues due to process variations and soft errors challenge the cost-effective deployment of future technology nodes. This paper presents a reliability management system for Dark Silicon chips (dsReliM) that optimizes for reliability of on-chip systems while jointly accounting for soft errors, process variations and the thermal design power (TDP) constraint. Towards the TDP-constrained reliability optimization, dsReliM leverages multiple reliable application versions that can potentially execute on different cores with frequency variations and supporting differenst voltage-frequency levels, thus facilitating distinct power, reliability and performance tradeoffs at run time. Experiments show that our dsReliM system provides up to 20% reliability improvements under different TDP constraints when compared to a state-of-the-art technique. Also, compared to an ideal-case optimal solution, dsReliM deviates up to 2.5% in terms of reliability efficiency, but speeds up the reliability management decision time by a factor of up to 3100.","PeriodicalId":281383,"journal":{"name":"2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CODESISSS.2015.7331370","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20
Abstract
Due to the tight power envelope, in the future technology nodes it is envisaged that not all cores in a many-core chip can be simultaneously powered-on (at full performance level). The power-gated cores are referred to as Dark Silicon. At the same time, growing reliability issues due to process variations and soft errors challenge the cost-effective deployment of future technology nodes. This paper presents a reliability management system for Dark Silicon chips (dsReliM) that optimizes for reliability of on-chip systems while jointly accounting for soft errors, process variations and the thermal design power (TDP) constraint. Towards the TDP-constrained reliability optimization, dsReliM leverages multiple reliable application versions that can potentially execute on different cores with frequency variations and supporting differenst voltage-frequency levels, thus facilitating distinct power, reliability and performance tradeoffs at run time. Experiments show that our dsReliM system provides up to 20% reliability improvements under different TDP constraints when compared to a state-of-the-art technique. Also, compared to an ideal-case optimal solution, dsReliM deviates up to 2.5% in terms of reliability efficiency, but speeds up the reliability management decision time by a factor of up to 3100.