{"title":"When less is more (LIMO):controlled parallelism forimproved efficiency","authors":"Gaurav Chadha, S. Mahlke, S. Narayanasamy","doi":"10.1145/2380403.2380431","DOIUrl":null,"url":null,"abstract":"While developing shared-memory programs, programmers often contend with the problem of how many threads to create for best efficiency. Creating as many threads as the number of available processor cores, or more, may not be the most efficient configuration. Too many threads can result in excessive contention for shared resources, wasting energy, which is of primary concern for embedded devices. Furthermore, thermal and power constraints prevent us from operating all the processor cores at the highest possible frequency, favoring fewer threads. The best number of threads to run depends on the application, user input and hardware resources available. It can also change at runtime making it infeasible for the programmer to determine this number.\n To address this problem, we propose LIMO, a runtime system that dynamically manages the number of running threads of an application for maximizing peformance and energy-efficiency. LIMO monitors threads' progress along with the usage of shared hardware resources to determine the best number of threads to run and the voltage and frequency level. With dynamic adaptation, LIMO provides an average of 21% performance improvement and a 2x improvement in energy-efficiency on a 32-core system over the default configuration of 32 threads for a set of concurrent applications from the PARSEC suite, the Apache web server, and the Sphinx speech recognition system.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"103 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2380403.2380431","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 30
Abstract
While developing shared-memory programs, programmers often contend with the problem of how many threads to create for best efficiency. Creating as many threads as the number of available processor cores, or more, may not be the most efficient configuration. Too many threads can result in excessive contention for shared resources, wasting energy, which is of primary concern for embedded devices. Furthermore, thermal and power constraints prevent us from operating all the processor cores at the highest possible frequency, favoring fewer threads. The best number of threads to run depends on the application, user input and hardware resources available. It can also change at runtime making it infeasible for the programmer to determine this number.
To address this problem, we propose LIMO, a runtime system that dynamically manages the number of running threads of an application for maximizing peformance and energy-efficiency. LIMO monitors threads' progress along with the usage of shared hardware resources to determine the best number of threads to run and the voltage and frequency level. With dynamic adaptation, LIMO provides an average of 21% performance improvement and a 2x improvement in energy-efficiency on a 32-core system over the default configuration of 32 threads for a set of concurrent applications from the PARSEC suite, the Apache web server, and the Sphinx speech recognition system.