基于FPGA的多核硬件加速缓存仿真

Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems Pub Date : 2018-10-09 DOI:10.1145/3264746.3264766

Shih-Hao Hung, Yi-Mo Ho, C. Yeh, C. Liu, Chen-Pang Lee

{"title":"基于FPGA的多核硬件加速缓存仿真","authors":"Shih-Hao Hung, Yi-Mo Ho, C. Yeh, C. Liu, Chen-Pang Lee","doi":"10.1145/3264746.3264766","DOIUrl":null,"url":null,"abstract":"Developers often use a virtual platform to develop software before the hardware is available. For software optimization, it is important to profile the cache misses of applications in a realistic operating environment under the virtual platform. In the multicore era, it is hard to simulate the coherence cache miss in a high speed way. In this paper, we propose a hardware-accelerated architecture to simulate the cache misses of a multicore system. We implement the cache miss simulator over a virtual platform with FPGA. Users can profile their software as running over the multicore system. The evaluation shows the throughput achieves 65 MB of trace log per second, when FPGA works in 100 MHz and about 570,000 logic elements are occupied to simulate 4 sets of L1 cache and 1 set of L2 cache in the multicore system with 4 virtual CPUs. The system achieves 1.6 to 2 times of speedup, when comparing with the popular cache miss simulator, Dinero IV. Dinero does less work and does not support coherence cache misses in the multicore system. The evaluation result shows high advantage to speed up the cache miss simulation of the multicore system by the hardware-accelerated architecture as well as FPGA.","PeriodicalId":186790,"journal":{"name":"Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Hardware-accelerated cache simulation for multicore by FPGA\",\"authors\":\"Shih-Hao Hung, Yi-Mo Ho, C. Yeh, C. Liu, Chen-Pang Lee\",\"doi\":\"10.1145/3264746.3264766\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Developers often use a virtual platform to develop software before the hardware is available. For software optimization, it is important to profile the cache misses of applications in a realistic operating environment under the virtual platform. In the multicore era, it is hard to simulate the coherence cache miss in a high speed way. In this paper, we propose a hardware-accelerated architecture to simulate the cache misses of a multicore system. We implement the cache miss simulator over a virtual platform with FPGA. Users can profile their software as running over the multicore system. The evaluation shows the throughput achieves 65 MB of trace log per second, when FPGA works in 100 MHz and about 570,000 logic elements are occupied to simulate 4 sets of L1 cache and 1 set of L2 cache in the multicore system with 4 virtual CPUs. The system achieves 1.6 to 2 times of speedup, when comparing with the popular cache miss simulator, Dinero IV. Dinero does less work and does not support coherence cache misses in the multicore system. The evaluation result shows high advantage to speed up the cache miss simulation of the multicore system by the hardware-accelerated architecture as well as FPGA.\",\"PeriodicalId\":186790,\"journal\":{\"name\":\"Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3264746.3264766\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3264746.3264766","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

开发人员经常在硬件可用之前使用虚拟平台开发软件。对于软件优化，在虚拟平台下的实际操作环境中分析应用程序的缓存丢失是非常重要的。在多核时代，很难以高速的方式模拟相干缓存丢失。在本文中，我们提出了一个硬件加速架构来模拟多核系统的缓存丢失。我们利用FPGA在虚拟平台上实现了缓存丢失模拟器。用户可以将他们的软件配置为在多核系统上运行。评估结果表明，在FPGA工作在100 MHz、占用约57万个逻辑单元的情况下，在4个虚拟cpu的多核系统中模拟4组L1缓存和1组L2缓存时，吞吐量达到65 MB / s。与目前流行的缓存丢失模拟器Dinero IV相比，系统实现了1.6到2倍的加速。Dinero在多核系统中做的工作更少，并且不支持一致性缓存丢失。评估结果表明，采用硬件加速架构和FPGA对多核系统的缓存缺失仿真有很大的加快优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Hardware-accelerated cache simulation for multicore by FPGA

Developers often use a virtual platform to develop software before the hardware is available. For software optimization, it is important to profile the cache misses of applications in a realistic operating environment under the virtual platform. In the multicore era, it is hard to simulate the coherence cache miss in a high speed way. In this paper, we propose a hardware-accelerated architecture to simulate the cache misses of a multicore system. We implement the cache miss simulator over a virtual platform with FPGA. Users can profile their software as running over the multicore system. The evaluation shows the throughput achieves 65 MB of trace log per second, when FPGA works in 100 MHz and about 570,000 logic elements are occupied to simulate 4 sets of L1 cache and 1 set of L2 cache in the multicore system with 4 virtual CPUs. The system achieves 1.6 to 2 times of speedup, when comparing with the popular cache miss simulator, Dinero IV. Dinero does less work and does not support coherence cache misses in the multicore system. The evaluation result shows high advantage to speed up the cache miss simulation of the multicore system by the hardware-accelerated architecture as well as FPGA.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems

自引率

0.00%

发文量