{"title":"基于多核加速器的异构嵌入式soc内存共享性能评估","authors":"Pirmin Vogel, A. Marongiu, L. Benini","doi":"10.1145/2723772.2723775","DOIUrl":null,"url":null,"abstract":"Today's systems-on-chip (SoCs) more and more conform to the models envisioned by the Heterogeneous System Architecture (HSA) foundation in which massively parallel, programmable many-core accelerators (PMCAs) not only cooperate but also coherently share memory with a powerful, multi-core host processor. Allowing direct access to system memory from both sides greatly simplifies application development, but it increases the potential interference to the memory system due to the PMCA. In this work, we evaluate the impact of a PMCA's memory traffic on the host performance using the Xilinx Zynq-7000 SoC. This platform features a dual-core ARM Cortex-A9 CPU, as well as a field-programmable gate array (FPGA), which we use to model a PMCA. Synthetic workload, real benchmarks from the MiBench and ALPBench suites, and collaborative workloads all show that the interference generated by the PMCA can significantly reduce the memory bandwidth seen by the host (on average up to 25 % for host applications).","PeriodicalId":350480,"journal":{"name":"Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores","volume":"344 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"An Evaluation of Memory Sharing Performance for Heterogeneous Embedded SoCs with Many-Core Accelerators\",\"authors\":\"Pirmin Vogel, A. Marongiu, L. Benini\",\"doi\":\"10.1145/2723772.2723775\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Today's systems-on-chip (SoCs) more and more conform to the models envisioned by the Heterogeneous System Architecture (HSA) foundation in which massively parallel, programmable many-core accelerators (PMCAs) not only cooperate but also coherently share memory with a powerful, multi-core host processor. Allowing direct access to system memory from both sides greatly simplifies application development, but it increases the potential interference to the memory system due to the PMCA. In this work, we evaluate the impact of a PMCA's memory traffic on the host performance using the Xilinx Zynq-7000 SoC. This platform features a dual-core ARM Cortex-A9 CPU, as well as a field-programmable gate array (FPGA), which we use to model a PMCA. Synthetic workload, real benchmarks from the MiBench and ALPBench suites, and collaborative workloads all show that the interference generated by the PMCA can significantly reduce the memory bandwidth seen by the host (on average up to 25 % for host applications).\",\"PeriodicalId\":350480,\"journal\":{\"name\":\"Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores\",\"volume\":\"344 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-02-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2723772.2723775\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2723772.2723775","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Evaluation of Memory Sharing Performance for Heterogeneous Embedded SoCs with Many-Core Accelerators
Today's systems-on-chip (SoCs) more and more conform to the models envisioned by the Heterogeneous System Architecture (HSA) foundation in which massively parallel, programmable many-core accelerators (PMCAs) not only cooperate but also coherently share memory with a powerful, multi-core host processor. Allowing direct access to system memory from both sides greatly simplifies application development, but it increases the potential interference to the memory system due to the PMCA. In this work, we evaluate the impact of a PMCA's memory traffic on the host performance using the Xilinx Zynq-7000 SoC. This platform features a dual-core ARM Cortex-A9 CPU, as well as a field-programmable gate array (FPGA), which we use to model a PMCA. Synthetic workload, real benchmarks from the MiBench and ALPBench suites, and collaborative workloads all show that the interference generated by the PMCA can significantly reduce the memory bandwidth seen by the host (on average up to 25 % for host applications).