Shih-Hao Hung, Yi-Mo Ho, C. Yeh, C. Liu, Chen-Pang Lee
{"title":"基于FPGA的多核硬件加速缓存仿真","authors":"Shih-Hao Hung, Yi-Mo Ho, C. Yeh, C. Liu, Chen-Pang Lee","doi":"10.1145/3264746.3264766","DOIUrl":null,"url":null,"abstract":"Developers often use a virtual platform to develop software before the hardware is available. For software optimization, it is important to profile the cache misses of applications in a realistic operating environment under the virtual platform. In the multicore era, it is hard to simulate the coherence cache miss in a high speed way. In this paper, we propose a hardware-accelerated architecture to simulate the cache misses of a multicore system. We implement the cache miss simulator over a virtual platform with FPGA. Users can profile their software as running over the multicore system. The evaluation shows the throughput achieves 65 MB of trace log per second, when FPGA works in 100 MHz and about 570,000 logic elements are occupied to simulate 4 sets of L1 cache and 1 set of L2 cache in the multicore system with 4 virtual CPUs. The system achieves 1.6 to 2 times of speedup, when comparing with the popular cache miss simulator, Dinero IV. Dinero does less work and does not support coherence cache misses in the multicore system. The evaluation result shows high advantage to speed up the cache miss simulation of the multicore system by the hardware-accelerated architecture as well as FPGA.","PeriodicalId":186790,"journal":{"name":"Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Hardware-accelerated cache simulation for multicore by FPGA\",\"authors\":\"Shih-Hao Hung, Yi-Mo Ho, C. Yeh, C. Liu, Chen-Pang Lee\",\"doi\":\"10.1145/3264746.3264766\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Developers often use a virtual platform to develop software before the hardware is available. For software optimization, it is important to profile the cache misses of applications in a realistic operating environment under the virtual platform. In the multicore era, it is hard to simulate the coherence cache miss in a high speed way. In this paper, we propose a hardware-accelerated architecture to simulate the cache misses of a multicore system. We implement the cache miss simulator over a virtual platform with FPGA. Users can profile their software as running over the multicore system. The evaluation shows the throughput achieves 65 MB of trace log per second, when FPGA works in 100 MHz and about 570,000 logic elements are occupied to simulate 4 sets of L1 cache and 1 set of L2 cache in the multicore system with 4 virtual CPUs. The system achieves 1.6 to 2 times of speedup, when comparing with the popular cache miss simulator, Dinero IV. Dinero does less work and does not support coherence cache misses in the multicore system. The evaluation result shows high advantage to speed up the cache miss simulation of the multicore system by the hardware-accelerated architecture as well as FPGA.\",\"PeriodicalId\":186790,\"journal\":{\"name\":\"Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3264746.3264766\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3264746.3264766","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hardware-accelerated cache simulation for multicore by FPGA
Developers often use a virtual platform to develop software before the hardware is available. For software optimization, it is important to profile the cache misses of applications in a realistic operating environment under the virtual platform. In the multicore era, it is hard to simulate the coherence cache miss in a high speed way. In this paper, we propose a hardware-accelerated architecture to simulate the cache misses of a multicore system. We implement the cache miss simulator over a virtual platform with FPGA. Users can profile their software as running over the multicore system. The evaluation shows the throughput achieves 65 MB of trace log per second, when FPGA works in 100 MHz and about 570,000 logic elements are occupied to simulate 4 sets of L1 cache and 1 set of L2 cache in the multicore system with 4 virtual CPUs. The system achieves 1.6 to 2 times of speedup, when comparing with the popular cache miss simulator, Dinero IV. Dinero does less work and does not support coherence cache misses in the multicore system. The evaluation result shows high advantage to speed up the cache miss simulation of the multicore system by the hardware-accelerated architecture as well as FPGA.