{"title":"An Optimization Method for Embarrassingly Parallel under MIC Architecture","authors":"Yunchun Li, Xiduo Tian","doi":"10.1109/DCABES.2015.12","DOIUrl":null,"url":null,"abstract":"Nowadays, heterogeneous architecture of CPU plus accelerator has become a mainstream in supercomputing. Intel lauched its Xeon Phi coprocessor in this context. It uses Intel's many-core architecture, which greatly improves the single node parallelism. This paper studies the optimization of embarrassingly parallel programs under Intel MIC architecture, to maximize the utilization of CPU and Phi processor, and reduce the running time of parallel programs, by combining the computing power of CPU and Phi. This so-called embarrassingly parallel program often have do all main loops, that is, there are no dependencies between iterations, so they can be fully parallelized. This do all loop exists in many typical parallel programs. We come up with a loop allocation method for do all loops under this CPU/MIC architecture, to satisfy the above performance objectives.","PeriodicalId":444588,"journal":{"name":"2015 14th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 14th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCABES.2015.12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Nowadays, heterogeneous architecture of CPU plus accelerator has become a mainstream in supercomputing. Intel lauched its Xeon Phi coprocessor in this context. It uses Intel's many-core architecture, which greatly improves the single node parallelism. This paper studies the optimization of embarrassingly parallel programs under Intel MIC architecture, to maximize the utilization of CPU and Phi processor, and reduce the running time of parallel programs, by combining the computing power of CPU and Phi. This so-called embarrassingly parallel program often have do all main loops, that is, there are no dependencies between iterations, so they can be fully parallelized. This do all loop exists in many typical parallel programs. We come up with a loop allocation method for do all loops under this CPU/MIC architecture, to satisfy the above performance objectives.