VEAL:循环的虚拟执行加速器

2008 International Symposium on Computer Architecture Pub Date : 2008-06-01 DOI:10.1145/1394608.1382155

Nathan Clark, Amir Hormati, S. Mahlke

{"title":"VEAL:循环的虚拟执行加速器","authors":"Nathan Clark, Amir Hormati, S. Mahlke","doi":"10.1145/1394608.1382155","DOIUrl":null,"url":null,"abstract":"Performance improvement solely through transistor scaling is becoming more and more difficult, thus it is increasingly common to see domain specific accelerators used in conjunction with general purpose processors to achieve future performance goals. There is a serious drawback to accelerators, though: binary compatibility. An application compiled to utilize an accelerator cannot run on a processor without that accelerator, and applications that do not utilize an accelerator will never use it. To overcome this problem, we propose decoupling the instruction set architecture from the underlying accelerators. Computation to be accelerated is expressed using a processorpsilas baseline instruction set, and light-weight dynamic translation maps the representation to whatever accelerators are available in the system. In this paper, we describe the changes to a compilation framework and processor system needed to support this abstraction for an important set of accelerator designs that support innermost loops. In this analysis, we investigate the dynamic overheads associated with abstraction as well as the static/dynamic tradeoffs to improve the dynamic mapping of loop-nests. As part of the exploration, we also provide a quantitative analysis of the hardware characteristics of effective loop accelerators. We conclude that using a hybrid static-dynamic compilation approach to map computation on to loop-level accelerators is an practical way to increase computation efficiency, without the overheads associated with instruction set modification.","PeriodicalId":190082,"journal":{"name":"2008 International Symposium on Computer Architecture","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"110","resultStr":"{\"title\":\"VEAL: Virtualized Execution Accelerator for Loops\",\"authors\":\"Nathan Clark, Amir Hormati, S. Mahlke\",\"doi\":\"10.1145/1394608.1382155\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Performance improvement solely through transistor scaling is becoming more and more difficult, thus it is increasingly common to see domain specific accelerators used in conjunction with general purpose processors to achieve future performance goals. There is a serious drawback to accelerators, though: binary compatibility. An application compiled to utilize an accelerator cannot run on a processor without that accelerator, and applications that do not utilize an accelerator will never use it. To overcome this problem, we propose decoupling the instruction set architecture from the underlying accelerators. Computation to be accelerated is expressed using a processorpsilas baseline instruction set, and light-weight dynamic translation maps the representation to whatever accelerators are available in the system. In this paper, we describe the changes to a compilation framework and processor system needed to support this abstraction for an important set of accelerator designs that support innermost loops. In this analysis, we investigate the dynamic overheads associated with abstraction as well as the static/dynamic tradeoffs to improve the dynamic mapping of loop-nests. As part of the exploration, we also provide a quantitative analysis of the hardware characteristics of effective loop accelerators. We conclude that using a hybrid static-dynamic compilation approach to map computation on to loop-level accelerators is an practical way to increase computation efficiency, without the overheads associated with instruction set modification.\",\"PeriodicalId\":190082,\"journal\":{\"name\":\"2008 International Symposium on Computer Architecture\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"110\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 International Symposium on Computer Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1394608.1382155\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1394608.1382155","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 110

摘要

仅仅通过晶体管缩放来提高性能变得越来越困难，因此越来越多地看到特定领域的加速器与通用处理器一起使用，以实现未来的性能目标。但是加速器有一个严重的缺点:二进制兼容性。编译为利用加速器的应用程序不能在没有该加速器的处理器上运行，而不利用加速器的应用程序永远不会使用它。为了克服这个问题，我们提出将指令集架构与底层加速器解耦。要加速的计算是使用处理器基准指令集表示的，轻量级动态转换将表示映射到系统中可用的任何加速器。在本文中，我们描述了编译框架和处理器系统所需要的变化，以支持支持最内层循环的一组重要加速器设计的抽象。在本分析中，我们研究了与抽象相关的动态开销以及静态/动态权衡，以改进循环巢的动态映射。作为探索的一部分，我们还对有效环路加速器的硬件特性进行了定量分析。我们得出的结论是，使用混合静态-动态编译方法将计算映射到循环级加速器是一种提高计算效率的实用方法，而不会增加与指令集修改相关的开销。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

VEAL: Virtualized Execution Accelerator for Loops

Performance improvement solely through transistor scaling is becoming more and more difficult, thus it is increasingly common to see domain specific accelerators used in conjunction with general purpose processors to achieve future performance goals. There is a serious drawback to accelerators, though: binary compatibility. An application compiled to utilize an accelerator cannot run on a processor without that accelerator, and applications that do not utilize an accelerator will never use it. To overcome this problem, we propose decoupling the instruction set architecture from the underlying accelerators. Computation to be accelerated is expressed using a processorpsilas baseline instruction set, and light-weight dynamic translation maps the representation to whatever accelerators are available in the system. In this paper, we describe the changes to a compilation framework and processor system needed to support this abstraction for an important set of accelerator designs that support innermost loops. In this analysis, we investigate the dynamic overheads associated with abstraction as well as the static/dynamic tradeoffs to improve the dynamic mapping of loop-nests. As part of the exploration, we also provide a quantitative analysis of the hardware characteristics of effective loop accelerators. We conclude that using a hybrid static-dynamic compilation approach to map computation on to loop-level accelerators is an practical way to increase computation efficiency, without the overheads associated with instruction set modification.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 International Symposium on Computer Architecture

自引率

0.00%

发文量