一个可变感知的OpenMP环境,用于在共享fpu处理器集群上有效地执行可精确配置的计算

Abbas Rahimi, A. Marongiu, Rajesh K. Gupta, L. Benini
{"title":"一个可变感知的OpenMP环境,用于在共享fpu处理器集群上有效地执行可精确配置的计算","authors":"Abbas Rahimi, A. Marongiu, Rajesh K. Gupta, L. Benini","doi":"10.1109/CODES-ISSS.2013.6659022","DOIUrl":null,"url":null,"abstract":"We propose a tightly-coupled, multi-core cluster architecture with shared, variation-tolerant, and accuracy-reconfigurable floating-point units (FPUs). The resilient shared-FPUs dynamically characterize FP pipeline vulnerability (FPV) and expose it as metadata to a software scheduler for reducing the cost of error correction. To further reduce this cost, our programming and runtime environment also supports controlled approximate computation through a combination of design-time and runtime techniques. We provide OpenMP extensions (as custom directives) for FP computations to specify parts of a program that can be executed approximately. We use a profiling technique to identify tolerable error significance and error rate thresholds in error-tolerant image processing applications. This information guides an application-driven hardware FPU synthesis and optimization design flow to generate efficient FPUs. At runtime, the scheduler utilizes FPV metadata and promotes FPUs to accurate mode, or demotes them to approximate mode depending upon the code region requirements. We demonstrate the effectiveness of our approach (in terms of energy savings) on a 16-core tightly-coupled cluster with eight shared-FPUs for both error-tolerant and general-purpose error-intolerant applications.","PeriodicalId":163484,"journal":{"name":"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":"{\"title\":\"A variability-aware OpenMP environment for efficient execution of accuracy-configurable computation on shared-FPU processor clusters\",\"authors\":\"Abbas Rahimi, A. Marongiu, Rajesh K. Gupta, L. Benini\",\"doi\":\"10.1109/CODES-ISSS.2013.6659022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a tightly-coupled, multi-core cluster architecture with shared, variation-tolerant, and accuracy-reconfigurable floating-point units (FPUs). The resilient shared-FPUs dynamically characterize FP pipeline vulnerability (FPV) and expose it as metadata to a software scheduler for reducing the cost of error correction. To further reduce this cost, our programming and runtime environment also supports controlled approximate computation through a combination of design-time and runtime techniques. We provide OpenMP extensions (as custom directives) for FP computations to specify parts of a program that can be executed approximately. We use a profiling technique to identify tolerable error significance and error rate thresholds in error-tolerant image processing applications. This information guides an application-driven hardware FPU synthesis and optimization design flow to generate efficient FPUs. At runtime, the scheduler utilizes FPV metadata and promotes FPUs to accurate mode, or demotes them to approximate mode depending upon the code region requirements. We demonstrate the effectiveness of our approach (in terms of energy savings) on a 16-core tightly-coupled cluster with eight shared-FPUs for both error-tolerant and general-purpose error-intolerant applications.\",\"PeriodicalId\":163484,\"journal\":{\"name\":\"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)\",\"volume\":\"75 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"30\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CODES-ISSS.2013.6659022\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CODES-ISSS.2013.6659022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 30

摘要

我们提出了一种紧密耦合的多核集群架构,具有共享、可变容忍和精度可重构的浮点单元(fpu)。弹性共享fpu动态表征FP管道漏洞(FPV),并将其作为元数据暴露给软件调度器,以减少错误纠正的成本。为了进一步降低这个成本,我们的编程和运行时环境还通过设计时和运行时技术的组合来支持受控的近似计算。我们为FP计算提供了OpenMP扩展(作为自定义指令),以指定程序中可以近似执行的部分。我们使用分析技术来识别容错图像处理应用中的可容忍错误显著性和错误率阈值。这些信息指导应用驱动的硬件FPU综合和优化设计流程,以生成高效的FPU。在运行时,调度程序利用FPV元数据,根据代码区域需求将FPV提升为精确模式,或将其降级为近似模式。我们在一个16核紧密耦合集群上演示了我们的方法的有效性(在节能方面),该集群具有8个共享fpu,用于容错和通用容错应用程序。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A variability-aware OpenMP environment for efficient execution of accuracy-configurable computation on shared-FPU processor clusters
We propose a tightly-coupled, multi-core cluster architecture with shared, variation-tolerant, and accuracy-reconfigurable floating-point units (FPUs). The resilient shared-FPUs dynamically characterize FP pipeline vulnerability (FPV) and expose it as metadata to a software scheduler for reducing the cost of error correction. To further reduce this cost, our programming and runtime environment also supports controlled approximate computation through a combination of design-time and runtime techniques. We provide OpenMP extensions (as custom directives) for FP computations to specify parts of a program that can be executed approximately. We use a profiling technique to identify tolerable error significance and error rate thresholds in error-tolerant image processing applications. This information guides an application-driven hardware FPU synthesis and optimization design flow to generate efficient FPUs. At runtime, the scheduler utilizes FPV metadata and promotes FPUs to accurate mode, or demotes them to approximate mode depending upon the code region requirements. We demonstrate the effectiveness of our approach (in terms of energy savings) on a 16-core tightly-coupled cluster with eight shared-FPUs for both error-tolerant and general-purpose error-intolerant applications.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信