MinDeg:运行时可重构加速器的性能导向替换策略

International Conference on Hardware/Software Codesign and System Synthesis Pub Date : 2009-10-11 DOI:10.1145/1629435.1629481

L. Bauer, M. Shafique, J. Henkel

{"title":"MinDeg:运行时可重构加速器的性能导向替换策略","authors":"L. Bauer, M. Shafique, J. Henkel","doi":"10.1145/1629435.1629481","DOIUrl":null,"url":null,"abstract":"Reconfigurable Processors utilize a reconfigurable fabric (to implement application-specific accelerators) and may perform run-time reconfigurations to exchange the set of deployed accelerators during application execution. Depending on the application requirements, the high utilization of the reconfigurable fabric (due to run-time reconfiguration) leads to a performance improvement compared to non-reconfigurable application-specific processors (ASIPs). However, as the reconfiguration time of fine-grained reconfigurable fabrics (i.e. FPGA-like structures) is rather long (in the range of milliseconds), it is crucial to avoid frequent cycles of reconfiguration-replacement-reconfiguration of the accelerators in order to exploit the real benefits of Reconfigurable Processors. Similar to memory caches, a replacement policy has to decide which reconfigurable accelerators shall be replaced in order to reconfigure additional accelerators. In the case that a recently replaced accelerator is demanded again, the reconfiguration delay might noticeably increase the application execution time.\n In this paper, we demonstrate that well-known policies for cache and page replacement (typically also used in state-of-the-art Reconfigurable Processors) are not generally suitable to replace reconfigurable accelerators.\n We therefore propose our novel performance-guided Minimum Degradation (MinDeg) replacement policy that particularly targets Reconfigurable Processors and replaces reconfigurable accelerators at run time. It accounts for the performance penalty that occurs due to replacement of a certain accelerator. Comparisons with the most-prominent replacement policies show the superiority of our approach. We evaluate and compare MinDeg for a wide range of different reconfiguration bandwidths and reconfigurable fabric sizes and achieve a speedup of up to 2.26x (1.74x compared to the widely used LRU policy). The introduced overhead to achieve this speedup is minor in comparison to the obtained application acceleration, i.e. the highest observed overhead (to calculate our MinDeg replacement policy) affected the obtained application acceleration by only 0.30%. A parallel hardware implementation of our MinDeg algorithm demands only 4,440 gate equivalents, which corresponds to 64% of the average requirements of one real-world reconfigurable accelerator (note: multiple accelerators are demanded per kernel). However, our MinDeg policy does not rely on hardware support, i.e. a trade-off between the hardware requirements and the acceleration is possible.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"MinDeg: a performance-guided replacement policy for run-time reconfigurable accelerators\",\"authors\":\"L. Bauer, M. Shafique, J. Henkel\",\"doi\":\"10.1145/1629435.1629481\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reconfigurable Processors utilize a reconfigurable fabric (to implement application-specific accelerators) and may perform run-time reconfigurations to exchange the set of deployed accelerators during application execution. Depending on the application requirements, the high utilization of the reconfigurable fabric (due to run-time reconfiguration) leads to a performance improvement compared to non-reconfigurable application-specific processors (ASIPs). However, as the reconfiguration time of fine-grained reconfigurable fabrics (i.e. FPGA-like structures) is rather long (in the range of milliseconds), it is crucial to avoid frequent cycles of reconfiguration-replacement-reconfiguration of the accelerators in order to exploit the real benefits of Reconfigurable Processors. Similar to memory caches, a replacement policy has to decide which reconfigurable accelerators shall be replaced in order to reconfigure additional accelerators. In the case that a recently replaced accelerator is demanded again, the reconfiguration delay might noticeably increase the application execution time.\\n In this paper, we demonstrate that well-known policies for cache and page replacement (typically also used in state-of-the-art Reconfigurable Processors) are not generally suitable to replace reconfigurable accelerators.\\n We therefore propose our novel performance-guided Minimum Degradation (MinDeg) replacement policy that particularly targets Reconfigurable Processors and replaces reconfigurable accelerators at run time. It accounts for the performance penalty that occurs due to replacement of a certain accelerator. Comparisons with the most-prominent replacement policies show the superiority of our approach. We evaluate and compare MinDeg for a wide range of different reconfiguration bandwidths and reconfigurable fabric sizes and achieve a speedup of up to 2.26x (1.74x compared to the widely used LRU policy). The introduced overhead to achieve this speedup is minor in comparison to the obtained application acceleration, i.e. the highest observed overhead (to calculate our MinDeg replacement policy) affected the obtained application acceleration by only 0.30%. A parallel hardware implementation of our MinDeg algorithm demands only 4,440 gate equivalents, which corresponds to 64% of the average requirements of one real-world reconfigurable accelerator (note: multiple accelerators are demanded per kernel). However, our MinDeg policy does not rely on hardware support, i.e. a trade-off between the hardware requirements and the acceleration is possible.\",\"PeriodicalId\":300268,\"journal\":{\"name\":\"International Conference on Hardware/Software Codesign and System Synthesis\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-10-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Hardware/Software Codesign and System Synthesis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1629435.1629481\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Hardware/Software Codesign and System Synthesis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1629435.1629481","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

可重构处理器利用可重构结构(实现特定于应用程序的加速器)，并且可以在应用程序执行期间执行运行时重新配置以交换部署的加速器集。根据应用程序需求，可重构结构的高利用率(由于运行时重新配置)与不可重构的特定于应用程序的处理器(asip)相比，可以提高性能。然而，由于细粒度可重构结构(即类fpga结构)的重构时间相当长(在毫秒范围内)，为了利用可重构处理器的真正优势，避免加速器的频繁重构-替换-重构循环至关重要。与内存缓存类似，替换策略必须决定替换哪些可重新配置的加速器，以便重新配置其他加速器。如果再次需要最近替换的加速器，重新配置延迟可能会显著增加应用程序的执行时间。在本文中，我们证明了众所周知的缓存和页面替换策略(通常也用于最先进的可重构处理器)通常不适合替换可重构加速器。因此，我们提出了新的以性能为导向的最小退化(MinDeg)替换策略，该策略特别针对可重构处理器，并在运行时替换可重构加速器。它解释了由于更换某个加速器而产生的性能损失。与最突出的替代政策相比，我们的方法具有优越性。我们评估和比较了MinDeg在各种不同的重构带宽和重构结构尺寸上的加速，并实现了高达2.26倍的加速(与广泛使用的LRU策略相比为1.74倍)。与获得的应用程序加速相比，实现此加速所引入的开销很小，即观察到的最高开销(用于计算我们的MinDeg替换策略)仅对获得的应用程序加速产生0.30%的影响。我们的MinDeg算法的并行硬件实现只需要4440个等效的门，这相当于一个真实世界可重构加速器平均需求的64%(注意:每个内核需要多个加速器)。然而，我们的MinDeg策略不依赖于硬件支持，也就是说，在硬件需求和加速之间进行权衡是可能的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MinDeg: a performance-guided replacement policy for run-time reconfigurable accelerators

Reconfigurable Processors utilize a reconfigurable fabric (to implement application-specific accelerators) and may perform run-time reconfigurations to exchange the set of deployed accelerators during application execution. Depending on the application requirements, the high utilization of the reconfigurable fabric (due to run-time reconfiguration) leads to a performance improvement compared to non-reconfigurable application-specific processors (ASIPs). However, as the reconfiguration time of fine-grained reconfigurable fabrics (i.e. FPGA-like structures) is rather long (in the range of milliseconds), it is crucial to avoid frequent cycles of reconfiguration-replacement-reconfiguration of the accelerators in order to exploit the real benefits of Reconfigurable Processors. Similar to memory caches, a replacement policy has to decide which reconfigurable accelerators shall be replaced in order to reconfigure additional accelerators. In the case that a recently replaced accelerator is demanded again, the reconfiguration delay might noticeably increase the application execution time. In this paper, we demonstrate that well-known policies for cache and page replacement (typically also used in state-of-the-art Reconfigurable Processors) are not generally suitable to replace reconfigurable accelerators. We therefore propose our novel performance-guided Minimum Degradation (MinDeg) replacement policy that particularly targets Reconfigurable Processors and replaces reconfigurable accelerators at run time. It accounts for the performance penalty that occurs due to replacement of a certain accelerator. Comparisons with the most-prominent replacement policies show the superiority of our approach. We evaluate and compare MinDeg for a wide range of different reconfiguration bandwidths and reconfigurable fabric sizes and achieve a speedup of up to 2.26x (1.74x compared to the widely used LRU policy). The introduced overhead to achieve this speedup is minor in comparison to the obtained application acceleration, i.e. the highest observed overhead (to calculate our MinDeg replacement policy) affected the obtained application acceleration by only 0.30%. A parallel hardware implementation of our MinDeg algorithm demands only 4,440 gate equivalents, which corresponds to 64% of the average requirements of one real-world reconfigurable accelerator (note: multiple accelerators are demanded per kernel). However, our MinDeg policy does not rely on hardware support, i.e. a trade-off between the hardware requirements and the acceleration is possible.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Conference on Hardware/Software Codesign and System Synthesis

自引率

0.00%

发文量