{"title":"A load balancing inspired optimization framework for exascale multicore systems: A complex networks approach","authors":"Yao Xiao, Yuankun Xue, Shahin Nazarian, P. Bogdan","doi":"10.1109/ICCAD.2017.8203781","DOIUrl":null,"url":null,"abstract":"Many-core multi-threaded performance is plagued by on-chip communication nonidealities, limited memory bandwidth, and critical sections. Inspired by complex network theory of social communities, we propose a novel methodology to model the dynamic execution of an application and partition the application into an optimal number of clusters for parallel execution. We first adopt an LLVM IR compiler analysis of a specific application and construct a dynamic application dependency graph encoding its computational and memory operations. Next, based on this graph, we propose an optimization model to find the optimal clusters such that (1) the intra-cluster edges are maximized, (2) the execution times of the clusters are nearly equalized, for load balancing, and (3) the cluster size does not exceed the core count. Our novel approach confines data movement to be mainly inside a cluster for power reduction and congestion prevention. Finally, we propose an algorithm to sort the graph of connected clusters topologically and map the clusters onto NoC. Experimental results on a 32-core NoC demonstrate a maximum speedup of 131.82% when compared to thread-based execution. Furthermore, the scalability of our framework makes it a promising software design automation platform.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"41","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCAD.2017.8203781","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 41
Abstract
Many-core multi-threaded performance is plagued by on-chip communication nonidealities, limited memory bandwidth, and critical sections. Inspired by complex network theory of social communities, we propose a novel methodology to model the dynamic execution of an application and partition the application into an optimal number of clusters for parallel execution. We first adopt an LLVM IR compiler analysis of a specific application and construct a dynamic application dependency graph encoding its computational and memory operations. Next, based on this graph, we propose an optimization model to find the optimal clusters such that (1) the intra-cluster edges are maximized, (2) the execution times of the clusters are nearly equalized, for load balancing, and (3) the cluster size does not exceed the core count. Our novel approach confines data movement to be mainly inside a cluster for power reduction and congestion prevention. Finally, we propose an algorithm to sort the graph of connected clusters topologically and map the clusters onto NoC. Experimental results on a 32-core NoC demonstrate a maximum speedup of 131.82% when compared to thread-based execution. Furthermore, the scalability of our framework makes it a promising software design automation platform.