{"title":"通过在非对称多核中透明地共享SIMD执行单元来提高能效","authors":"Caio Vieira, Antonio Carlos Schneider Beck","doi":"10.1109/SBCCI53441.2021.9529982","DOIUrl":null,"url":null,"abstract":"Single-ISA Asymmetric multicore architectures (e.g., ARM big.LITTLE) combine high-performance and energy efficiency in the same chip by providing different microarchitectures so the applications can transparently migrate from one to another accordingly. However, in such architectures, the big core features resource-expensive Execution Units (EU) to support ISA extensions, such as SIMD and FP, which may rarely be used depending on the application at hand. These same extensions are supported by the little core but using power-efficient EUs. Given that, in this work, we propose a decoupled offloading mechanism to allow the big core to use such power-efficient EUs in the little core while its own can be power-gated, maintaining the original migration transparency of the architecture. Since applications may have different phases, thus having more or fewer extension instructions usage, we also propose an arbiter to decide when to activate the decoupled offloading at runtime. We evaluate our technique considering ARM NEON as the ISA extension and ARM A7 and A15 as the little and big cores, respectively. Our evaluation shows that, on average, our approach provides 15.9% in energy improvements at the cost of 2.2% in time overhead for mibench benchmarks, which represent embedded application workloads; and, on average, 6.4% in energy gains at the cost of 1.1% in time overhead for polybench benchmarks, which have high NEON usage.","PeriodicalId":270661,"journal":{"name":"2021 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Improving energy efficiency by transparently sharing SIMD Execution Units in Assymetric Multicores\",\"authors\":\"Caio Vieira, Antonio Carlos Schneider Beck\",\"doi\":\"10.1109/SBCCI53441.2021.9529982\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Single-ISA Asymmetric multicore architectures (e.g., ARM big.LITTLE) combine high-performance and energy efficiency in the same chip by providing different microarchitectures so the applications can transparently migrate from one to another accordingly. However, in such architectures, the big core features resource-expensive Execution Units (EU) to support ISA extensions, such as SIMD and FP, which may rarely be used depending on the application at hand. These same extensions are supported by the little core but using power-efficient EUs. Given that, in this work, we propose a decoupled offloading mechanism to allow the big core to use such power-efficient EUs in the little core while its own can be power-gated, maintaining the original migration transparency of the architecture. Since applications may have different phases, thus having more or fewer extension instructions usage, we also propose an arbiter to decide when to activate the decoupled offloading at runtime. We evaluate our technique considering ARM NEON as the ISA extension and ARM A7 and A15 as the little and big cores, respectively. Our evaluation shows that, on average, our approach provides 15.9% in energy improvements at the cost of 2.2% in time overhead for mibench benchmarks, which represent embedded application workloads; and, on average, 6.4% in energy gains at the cost of 1.1% in time overhead for polybench benchmarks, which have high NEON usage.\",\"PeriodicalId\":270661,\"journal\":{\"name\":\"2021 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SBCCI53441.2021.9529982\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 34th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBCCI53441.2021.9529982","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improving energy efficiency by transparently sharing SIMD Execution Units in Assymetric Multicores
Single-ISA Asymmetric multicore architectures (e.g., ARM big.LITTLE) combine high-performance and energy efficiency in the same chip by providing different microarchitectures so the applications can transparently migrate from one to another accordingly. However, in such architectures, the big core features resource-expensive Execution Units (EU) to support ISA extensions, such as SIMD and FP, which may rarely be used depending on the application at hand. These same extensions are supported by the little core but using power-efficient EUs. Given that, in this work, we propose a decoupled offloading mechanism to allow the big core to use such power-efficient EUs in the little core while its own can be power-gated, maintaining the original migration transparency of the architecture. Since applications may have different phases, thus having more or fewer extension instructions usage, we also propose an arbiter to decide when to activate the decoupled offloading at runtime. We evaluate our technique considering ARM NEON as the ISA extension and ARM A7 and A15 as the little and big cores, respectively. Our evaluation shows that, on average, our approach provides 15.9% in energy improvements at the cost of 2.2% in time overhead for mibench benchmarks, which represent embedded application workloads; and, on average, 6.4% in energy gains at the cost of 1.1% in time overhead for polybench benchmarks, which have high NEON usage.