{"title":"快速指令缓存模拟比你想象的要棘手","authors":"M. Badaroux, J. Dumas, F. Pétrot","doi":"10.1145/3579170.3579261","DOIUrl":null,"url":null,"abstract":"Given the performances it achieves, dynamic binary translation is the most compelling simulation approach for cross-emulation of software centric systems. This speed comes at a cost: simulation is purely functional. Modeling instruction caches by instrumenting each target instruction is feasible, but severely degrades performances. As the translation occurs per target instruction block, we propose to model instruction caches at that granularity. This raises a few issues that we detail and mitigate. We implement this solution in the QEMU dynamic binary translation engine, which brings up an interesting problem inherent to this simulation strategy. Using as test vehicle a multicore RISC-V based platform, we show that a proper model can be nearly as accurate as an instruction accurate model. On the PolyBench/C and PARSEC benchmarks, our model slows down simulation by a factor of 2 to 10 compared to vanilla QEMU. Although not negligible, this is to be balanced with the factor of 20 to 60 for the instruction accurate approach.","PeriodicalId":153341,"journal":{"name":"Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fast Instruction Cache Simulation is Trickier than You Think\",\"authors\":\"M. Badaroux, J. Dumas, F. Pétrot\",\"doi\":\"10.1145/3579170.3579261\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Given the performances it achieves, dynamic binary translation is the most compelling simulation approach for cross-emulation of software centric systems. This speed comes at a cost: simulation is purely functional. Modeling instruction caches by instrumenting each target instruction is feasible, but severely degrades performances. As the translation occurs per target instruction block, we propose to model instruction caches at that granularity. This raises a few issues that we detail and mitigate. We implement this solution in the QEMU dynamic binary translation engine, which brings up an interesting problem inherent to this simulation strategy. Using as test vehicle a multicore RISC-V based platform, we show that a proper model can be nearly as accurate as an instruction accurate model. On the PolyBench/C and PARSEC benchmarks, our model slows down simulation by a factor of 2 to 10 compared to vanilla QEMU. Although not negligible, this is to be balanced with the factor of 20 to 60 for the instruction accurate approach.\",\"PeriodicalId\":153341,\"journal\":{\"name\":\"Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3579170.3579261\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the DroneSE and RAPIDO: System Engineering for constrained embedded systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3579170.3579261","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Fast Instruction Cache Simulation is Trickier than You Think
Given the performances it achieves, dynamic binary translation is the most compelling simulation approach for cross-emulation of software centric systems. This speed comes at a cost: simulation is purely functional. Modeling instruction caches by instrumenting each target instruction is feasible, but severely degrades performances. As the translation occurs per target instruction block, we propose to model instruction caches at that granularity. This raises a few issues that we detail and mitigate. We implement this solution in the QEMU dynamic binary translation engine, which brings up an interesting problem inherent to this simulation strategy. Using as test vehicle a multicore RISC-V based platform, we show that a proper model can be nearly as accurate as an instruction accurate model. On the PolyBench/C and PARSEC benchmarks, our model slows down simulation by a factor of 2 to 10 compared to vanilla QEMU. Although not negligible, this is to be balanced with the factor of 20 to 60 for the instruction accurate approach.