BRAVO: Balanced Reliability-Aware Voltage Optimization

Karthik Swaminathan, Nandhini Chandramoorthy, Chen-Yong Cher, Ramon Bertran Monfort, A. Buyuktosunoglu, P. Bose
{"title":"BRAVO: Balanced Reliability-Aware Voltage Optimization","authors":"Karthik Swaminathan, Nandhini Chandramoorthy, Chen-Yong Cher, Ramon Bertran Monfort, A. Buyuktosunoglu, P. Bose","doi":"10.1109/HPCA.2017.56","DOIUrl":null,"url":null,"abstract":"Defining a processor micro-architecture for a targeted productspace involves multi-dimensional optimization across performance, power and reliability axes. A key decision in sucha definition process is the circuit-and technology-driven parameterof the nominal (voltage, frequency) operating point. This is a challenging task, since optimizing individually orpair-wise amongst these metrics usually results in a designthat falls short of the specification in at least one of the threedimensions. Aided by academic research, industry has nowadopted early-stage definition methodologies that considerboth energy-and performance-related metrics. Reliabilityrelatedenhancements, on the other hand, tend to get factoredin via a separate thread of activity. This task is typically pursuedwithout thorough pre-silicon quantifications of the energyor even the performance cost. In the late-CMOS designera, reliability needs to move from a post-silicon afterthoughtor validation-only effort to a pre-silicon definitionprocess. In this paper, we present BRAVO, a methodologyfor such reliability-aware design space exploration. BRAVOis supported by a multi-core simulation framework that integratesperformance, power and reliability modeling capability. Errors induced by both soft and hard fault incidence arecaptured within the reliability models. We introduce the notionof the Balanced Reliability Metric (BRM), that we useto evaluate overall reliability of the processor across soft andhard error incidences. We demonstrate up to 79% improvementin reliability in terms of this metric, for only a 6% dropin overall energy efficiency over design points that maximizeenergy efficiency. We also demonstrate several real-life usecaseapplications of BRAVO in an industrial setting.","PeriodicalId":118950,"journal":{"name":"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2017.56","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25

Abstract

Defining a processor micro-architecture for a targeted productspace involves multi-dimensional optimization across performance, power and reliability axes. A key decision in sucha definition process is the circuit-and technology-driven parameterof the nominal (voltage, frequency) operating point. This is a challenging task, since optimizing individually orpair-wise amongst these metrics usually results in a designthat falls short of the specification in at least one of the threedimensions. Aided by academic research, industry has nowadopted early-stage definition methodologies that considerboth energy-and performance-related metrics. Reliabilityrelatedenhancements, on the other hand, tend to get factoredin via a separate thread of activity. This task is typically pursuedwithout thorough pre-silicon quantifications of the energyor even the performance cost. In the late-CMOS designera, reliability needs to move from a post-silicon afterthoughtor validation-only effort to a pre-silicon definitionprocess. In this paper, we present BRAVO, a methodologyfor such reliability-aware design space exploration. BRAVOis supported by a multi-core simulation framework that integratesperformance, power and reliability modeling capability. Errors induced by both soft and hard fault incidence arecaptured within the reliability models. We introduce the notionof the Balanced Reliability Metric (BRM), that we useto evaluate overall reliability of the processor across soft andhard error incidences. We demonstrate up to 79% improvementin reliability in terms of this metric, for only a 6% dropin overall energy efficiency over design points that maximizeenergy efficiency. We also demonstrate several real-life usecaseapplications of BRAVO in an industrial setting.
平衡可靠性感知电压优化
为目标产品空间定义处理器微架构涉及跨性能、功率和可靠性轴的多维优化。这种定义过程中的一个关键决定是电路和技术驱动的标称工作点(电压、频率)参数。这是一项具有挑战性的任务,因为在这些指标之间单独或成对地优化通常会导致设计至少在三个维度中的一个方面低于规范。在学术研究的帮助下,工业界现在已经采用了考虑能源和性能相关指标的早期定义方法。另一方面,与可靠性相关的增强倾向于通过单独的活动线程来考虑。这项任务通常在没有对能量甚至性能成本进行彻底的硅前量化的情况下进行。在后期cmos设计中,可靠性需要从硅后的事后验证工作转向硅前的定义过程。在本文中,我们提出了BRAVO,一种用于这种可靠性感知设计空间探索的方法。BRAVOis由一个多核仿真框架支持,该框架集成了性能、功率和可靠性建模能力。在可靠性模型中捕获软故障和硬故障发生率引起的误差。我们引入了平衡可靠性度量(BRM)的概念,我们使用它来评估处理器在软错误和硬错误发生率下的整体可靠性。根据这一指标,我们在可靠性方面提高了79%,而在最大化能源效率的设计点上,整体能源效率仅下降了6%。我们还演示了BRAVO在工业环境中的几个实际用例应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信