在指令集扩展中引入本地内存元素

Proceedings. 41st Design Automation Conference, 2004. Pub Date : 2004-06-07 DOI:10.1145/996566.996765

P. Biswas, Vinay Choudhary, K. Atasu, L. Pozzi, P. Ienne, N. Dutt

{"title":"在指令集扩展中引入本地内存元素","authors":"P. Biswas, Vinay Choudhary, K. Atasu, L. Pozzi, P. Ienne, N. Dutt","doi":"10.1145/996566.996765","DOIUrl":null,"url":null,"abstract":"Automatic generation of Instruction Set Extensions (ISEs), to be executed on a custom processing unit or a coprocessor is an important step towards processor customization. A typical goal of a manual designer is to combine a large number of atomic instructions into an ISE satisfying microarchitectural constraints. However, memory operations pose a challenge for previous ISE approaches by limiting the size of the resulting instruction. In this paper, we introduce memory elements into custom units which result in ISEs closer to those sought after by the designers. We consider two kinds of memory elements for mapping to the specialized hardware: small hardware tables and architecturally-visible state registers. We devised a genetic algorithm to specifically exploit opportunities of introducing memory elements during ISE generation. Finally, we demonstrate the effectiveness of our approach by a detailed study of the variation in performance, area and energy in the presence of the generated ISEs, on a number of MediaBench, EEMBC and cryptographic applications. With the introduction of memory, the average speedup varied from 2.7X to 5X depending on the architectural configuration with a nominal area overhead. Moreover, we obtained an average energy reduction of 26% with respect to a 32-KB cache.","PeriodicalId":115059,"journal":{"name":"Proceedings. 41st Design Automation Conference, 2004.","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"70","resultStr":"{\"title\":\"Introduction of local memory elements in instruction set extensions\",\"authors\":\"P. Biswas, Vinay Choudhary, K. Atasu, L. Pozzi, P. Ienne, N. Dutt\",\"doi\":\"10.1145/996566.996765\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic generation of Instruction Set Extensions (ISEs), to be executed on a custom processing unit or a coprocessor is an important step towards processor customization. A typical goal of a manual designer is to combine a large number of atomic instructions into an ISE satisfying microarchitectural constraints. However, memory operations pose a challenge for previous ISE approaches by limiting the size of the resulting instruction. In this paper, we introduce memory elements into custom units which result in ISEs closer to those sought after by the designers. We consider two kinds of memory elements for mapping to the specialized hardware: small hardware tables and architecturally-visible state registers. We devised a genetic algorithm to specifically exploit opportunities of introducing memory elements during ISE generation. Finally, we demonstrate the effectiveness of our approach by a detailed study of the variation in performance, area and energy in the presence of the generated ISEs, on a number of MediaBench, EEMBC and cryptographic applications. With the introduction of memory, the average speedup varied from 2.7X to 5X depending on the architectural configuration with a nominal area overhead. Moreover, we obtained an average energy reduction of 26% with respect to a 32-KB cache.\",\"PeriodicalId\":115059,\"journal\":{\"name\":\"Proceedings. 41st Design Automation Conference, 2004.\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-06-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"70\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. 41st Design Automation Conference, 2004.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/996566.996765\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 41st Design Automation Conference, 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/996566.996765","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 70

摘要

指令集扩展(ISEs)的自动生成，将在自定义处理单元或协处理器上执行，是处理器自定义的重要一步。手动设计器的典型目标是将大量原子指令组合到满足微架构约束的ISE中。然而，内存操作通过限制结果指令的大小，对以前的ISE方法提出了挑战。在本文中，我们在定制单元中引入了存储元素，从而使ise更接近设计者所追求的目标。我们考虑两种用于映射到专用硬件的内存元素:小型硬件表和体系结构可见的状态寄存器。我们设计了一种遗传算法，专门利用在ISE生成过程中引入存储元素的机会。最后，我们通过在mediabbench、EEMBC和加密应用程序上对生成的ISEs存在的性能、面积和能量变化的详细研究来证明我们方法的有效性。随着内存的引入，平均加速从2.7倍到5倍不等，这取决于具有标称面积开销的体系结构配置。此外，相对于32kb的缓存，我们获得了26%的平均能耗降低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Introduction of local memory elements in instruction set extensions

Automatic generation of Instruction Set Extensions (ISEs), to be executed on a custom processing unit or a coprocessor is an important step towards processor customization. A typical goal of a manual designer is to combine a large number of atomic instructions into an ISE satisfying microarchitectural constraints. However, memory operations pose a challenge for previous ISE approaches by limiting the size of the resulting instruction. In this paper, we introduce memory elements into custom units which result in ISEs closer to those sought after by the designers. We consider two kinds of memory elements for mapping to the specialized hardware: small hardware tables and architecturally-visible state registers. We devised a genetic algorithm to specifically exploit opportunities of introducing memory elements during ISE generation. Finally, we demonstrate the effectiveness of our approach by a detailed study of the variation in performance, area and energy in the presence of the generated ISEs, on a number of MediaBench, EEMBC and cryptographic applications. With the introduction of memory, the average speedup varied from 2.7X to 5X depending on the architectural configuration with a nominal area overhead. Moreover, we obtained an average energy reduction of 26% with respect to a 32-KB cache.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. 41st Design Automation Conference, 2004.

自引率

0.00%

发文量