Ali Aalsaud, R. Shafik, A. Rafiev, Fei Xia, Sheng Yang, A. Yakovlev
{"title":"Power--Aware Performance Adaptation of Concurrent Applications in Heterogeneous Many-Core Systems","authors":"Ali Aalsaud, R. Shafik, A. Rafiev, Fei Xia, Sheng Yang, A. Yakovlev","doi":"10.1145/2934583.2934612","DOIUrl":null,"url":null,"abstract":"Modern embedded systems execute multiple applications, both sequentially and concurrently. These applications are exercised on heterogeneous platforms generating varying power consumption and system workloads (CPU or memory intensive or both). As a result, determining the most energy-efficient system configuration (i.e. the number of parallel threads, their core allocations and operating frequencies) tailored for each kind of workload and application scenario is extremely challenging. In this paper, we propose a novel runtime optimization approach with the aim of achieving maximized power normalized performance considering dynamic variation of workload and application scenarios. Fundamental to this approach is a comprehensive study to investigate the tradeoffs between inter-application concurrency with performance and power consumption under different system configurations. Using real experimental measurements on an Odroid XU-3 heterogeneous platform with a number of PARSEC benchmark applications, we model power normalized performance (in terms of IPS/Watt) underpinning analytical power and performance models, derived through multivariate linear regression (MLR). Using these models, we show that with increasing number of concurrent CPU intensive applications show variable gains in IPS/Watt compared to the memory intensive applications in both sequential and concurrent application scenarios. Furthermore, we demonstrate that it is possible to continuously adapt system configuration through a low-cost and linear-complexity runtime algorithm, which can improve the IPS/Watt by up to 125% compared to the existing approach.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"44","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2934583.2934612","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 44
Abstract
Modern embedded systems execute multiple applications, both sequentially and concurrently. These applications are exercised on heterogeneous platforms generating varying power consumption and system workloads (CPU or memory intensive or both). As a result, determining the most energy-efficient system configuration (i.e. the number of parallel threads, their core allocations and operating frequencies) tailored for each kind of workload and application scenario is extremely challenging. In this paper, we propose a novel runtime optimization approach with the aim of achieving maximized power normalized performance considering dynamic variation of workload and application scenarios. Fundamental to this approach is a comprehensive study to investigate the tradeoffs between inter-application concurrency with performance and power consumption under different system configurations. Using real experimental measurements on an Odroid XU-3 heterogeneous platform with a number of PARSEC benchmark applications, we model power normalized performance (in terms of IPS/Watt) underpinning analytical power and performance models, derived through multivariate linear regression (MLR). Using these models, we show that with increasing number of concurrent CPU intensive applications show variable gains in IPS/Watt compared to the memory intensive applications in both sequential and concurrent application scenarios. Furthermore, we demonstrate that it is possible to continuously adapt system configuration through a low-cost and linear-complexity runtime algorithm, which can improve the IPS/Watt by up to 125% compared to the existing approach.