Ruoyu Zhou, George Wort, Márton Erdos, Timothy M. Jones
{"title":"双面三元组:通过动态二进制修改开发并行性","authors":"Ruoyu Zhou, George Wort, Márton Erdos, Timothy M. Jones","doi":"10.1145/3313808.3313812","DOIUrl":null,"url":null,"abstract":"We present a unified approach for exploiting thread-level, data-level, and memory-level parallelism through a same-ISA dynamic binary modifier guided by static binary analysis. A static binary analyser first examines an executable and determines the operations required to extract parallelism at runtime, encoding them as a series of rewrite rules that a dynamic binary modifier uses to perform binary transformation. We demonstrate this framework by exploiting three different kinds of parallelism to perform automatic vectorisation, software prefetching, and automatic parallelisation together on legacy application binaries. Software prefetch insertion alone achieves an average speedup of 1.2x, comparing favourably with an automatic compiler pass. Automatic vectorisation brings speedups of 2.7x on the TSVC benchmarks, significantly beating a compiler approach for some workloads. Finally, combining prefetching, vectorisation, and parallelisation realises a speedup of 3.8x on a representative application loop.","PeriodicalId":350040,"journal":{"name":"Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"The janus triad: exploiting parallelism through dynamic binary modification\",\"authors\":\"Ruoyu Zhou, George Wort, Márton Erdos, Timothy M. Jones\",\"doi\":\"10.1145/3313808.3313812\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a unified approach for exploiting thread-level, data-level, and memory-level parallelism through a same-ISA dynamic binary modifier guided by static binary analysis. A static binary analyser first examines an executable and determines the operations required to extract parallelism at runtime, encoding them as a series of rewrite rules that a dynamic binary modifier uses to perform binary transformation. We demonstrate this framework by exploiting three different kinds of parallelism to perform automatic vectorisation, software prefetching, and automatic parallelisation together on legacy application binaries. Software prefetch insertion alone achieves an average speedup of 1.2x, comparing favourably with an automatic compiler pass. Automatic vectorisation brings speedups of 2.7x on the TSVC benchmarks, significantly beating a compiler approach for some workloads. Finally, combining prefetching, vectorisation, and parallelisation realises a speedup of 3.8x on a representative application loop.\",\"PeriodicalId\":350040,\"journal\":{\"name\":\"Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3313808.3313812\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3313808.3313812","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The janus triad: exploiting parallelism through dynamic binary modification
We present a unified approach for exploiting thread-level, data-level, and memory-level parallelism through a same-ISA dynamic binary modifier guided by static binary analysis. A static binary analyser first examines an executable and determines the operations required to extract parallelism at runtime, encoding them as a series of rewrite rules that a dynamic binary modifier uses to perform binary transformation. We demonstrate this framework by exploiting three different kinds of parallelism to perform automatic vectorisation, software prefetching, and automatic parallelisation together on legacy application binaries. Software prefetch insertion alone achieves an average speedup of 1.2x, comparing favourably with an automatic compiler pass. Automatic vectorisation brings speedups of 2.7x on the TSVC benchmarks, significantly beating a compiler approach for some workloads. Finally, combining prefetching, vectorisation, and parallelisation realises a speedup of 3.8x on a representative application loop.