Flexible Error Protection for Energy Efficient Reliable Architectures

2010 22nd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2010-10-27 DOI:10.1109/SBAC-PAD.2010.37

Timothy N. Miller, Nagarjuna Surapaneni, R. Teodorescu

{"title":"Flexible Error Protection for Energy Efficient Reliable Architectures","authors":"Timothy N. Miller, Nagarjuna Surapaneni, R. Teodorescu","doi":"10.1109/SBAC-PAD.2010.37","DOIUrl":null,"url":null,"abstract":"Technology scaling is having an increasingly detrimental effect on microprocessor reliability, with increased variability and higher susceptibility to errors. At the same time, as integration of chip multiprocessors increases, power consumption is becoming a significant bottleneck that could threaten their growth. To deal with these competing trends, energy-efficient solutions are needed to deal with reliability problems. This paper presents a reliable multicore architecture that provides targeted error protection by adapting to the characteristics of individual cores and workloads, with the goal of providing reliability with minimum energy. The user can specify an acceptable reliability target for each chip, core, or application. The system then adjusts a range of parameters, including replication and supply voltage, to meet that reliability goal. In this multicore architecture, each core consists of a pair of pipelines that can run independently (running separate threads) or in concert (running the same thread and verifying results). Redundancy is enabled selectively, at functional unit granularity. The architecture also employs timing speculation for mitigation of variation-induced timing errors and to reduce the power overhead of error protection. On-line control based on machine learning dynamically adjusts multiple parameters to minimize energy consumption. Evaluation shows that dynamic adaptation of voltage and redundancy can reduce the energy delay product of a CMP by 30 − 60% compared to static dual modular redundancy.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PAD.2010.37","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

Abstract

Technology scaling is having an increasingly detrimental effect on microprocessor reliability, with increased variability and higher susceptibility to errors. At the same time, as integration of chip multiprocessors increases, power consumption is becoming a significant bottleneck that could threaten their growth. To deal with these competing trends, energy-efficient solutions are needed to deal with reliability problems. This paper presents a reliable multicore architecture that provides targeted error protection by adapting to the characteristics of individual cores and workloads, with the goal of providing reliability with minimum energy. The user can specify an acceptable reliability target for each chip, core, or application. The system then adjusts a range of parameters, including replication and supply voltage, to meet that reliability goal. In this multicore architecture, each core consists of a pair of pipelines that can run independently (running separate threads) or in concert (running the same thread and verifying results). Redundancy is enabled selectively, at functional unit granularity. The architecture also employs timing speculation for mitigation of variation-induced timing errors and to reduce the power overhead of error protection. On-line control based on machine learning dynamically adjusts multiple parameters to minimize energy consumption. Evaluation shows that dynamic adaptation of voltage and redundancy can reduce the energy delay product of a CMP by 30 − 60% compared to static dual modular redundancy.

查看原文本刊更多论文

灵活的错误保护节能可靠的架构

随着变异性的增加和对错误的敏感性的提高，技术规模对微处理器可靠性的影响越来越大。与此同时，随着芯片多处理器集成度的提高，功耗正成为可能威胁其增长的重要瓶颈。为了应对这些竞争趋势，需要采用节能解决方案来解决可靠性问题。本文提出了一种可靠的多核架构，通过适应单个核和工作负载的特点提供有针对性的错误保护，目标是以最小的能量提供可靠性。用户可以为每个芯片、核心或应用程序指定可接受的可靠性目标。然后，系统调整一系列参数，包括复制和供电电压，以满足可靠性目标。在这个多核体系结构中，每个核心由一对管道组成，它们可以独立运行(运行单独的线程)，也可以协同运行(运行相同的线程并验证结果)。冗余是选择性地在功能单元粒度上启用的。该体系结构还采用时序推测来减轻变化引起的时序错误，并减少错误保护的功率开销。基于机器学习的在线控制动态调整多个参数，使能耗最小化。评估表明，与静态双模冗余相比，电压和冗余的动态自适应可以使CMP的能量延迟积降低30 ~ 60%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 22nd International Symposium on Computer Architecture and High Performance Computing

自引率

0.00%

发文量