{"title":"基于异构多核动态性能预测的块并行化","authors":"A. Dab, Y. Slama","doi":"10.1109/HPCS.2017.28","DOIUrl":null,"url":null,"abstract":"Multicore machines are becoming more and more common. Ideally, all applications benefit from these advances in computer architecture. A complex challenge in parallel computing is cores load balancing to minimize the overall execution time called Make span of the parallel program. As multicores may have different architectures, an effective mapping should support this unknown variation to avoid drawbacks on make span. In fact, mapping or static load balancing method may not be effective when the target state machine changes during program execution. Thread affinity has appeared as an important technique to improve the program performance and for better performance stability. In this context, we propose a predictive approach using iterations chunking at runtime allowing parallel code adaptation to processor's performance. Our approach is based on thread pinning and performance detection at execution time. From a parallel program, we define a set of loop nest iterations, forming what is called chunk, and we run it using a first mapping assuming homogeneous cores. Then, performance assessment would correct mapping by speculating the future core's state. The new mapping would be then applied to a new chunk for further evaluation and prediction. The process would stop when the program is fully executed or when judging that chunking is no longer effective.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"45 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Chunk-Wise Parallelization Based on Dynamic Performance Prediction on Heterogeneous Multicores\",\"authors\":\"A. Dab, Y. Slama\",\"doi\":\"10.1109/HPCS.2017.28\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multicore machines are becoming more and more common. Ideally, all applications benefit from these advances in computer architecture. A complex challenge in parallel computing is cores load balancing to minimize the overall execution time called Make span of the parallel program. As multicores may have different architectures, an effective mapping should support this unknown variation to avoid drawbacks on make span. In fact, mapping or static load balancing method may not be effective when the target state machine changes during program execution. Thread affinity has appeared as an important technique to improve the program performance and for better performance stability. In this context, we propose a predictive approach using iterations chunking at runtime allowing parallel code adaptation to processor's performance. Our approach is based on thread pinning and performance detection at execution time. From a parallel program, we define a set of loop nest iterations, forming what is called chunk, and we run it using a first mapping assuming homogeneous cores. Then, performance assessment would correct mapping by speculating the future core's state. The new mapping would be then applied to a new chunk for further evaluation and prediction. The process would stop when the program is fully executed or when judging that chunking is no longer effective.\",\"PeriodicalId\":115758,\"journal\":{\"name\":\"2017 International Conference on High Performance Computing & Simulation (HPCS)\",\"volume\":\"45 3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on High Performance Computing & Simulation (HPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCS.2017.28\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCS.2017.28","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Chunk-Wise Parallelization Based on Dynamic Performance Prediction on Heterogeneous Multicores
Multicore machines are becoming more and more common. Ideally, all applications benefit from these advances in computer architecture. A complex challenge in parallel computing is cores load balancing to minimize the overall execution time called Make span of the parallel program. As multicores may have different architectures, an effective mapping should support this unknown variation to avoid drawbacks on make span. In fact, mapping or static load balancing method may not be effective when the target state machine changes during program execution. Thread affinity has appeared as an important technique to improve the program performance and for better performance stability. In this context, we propose a predictive approach using iterations chunking at runtime allowing parallel code adaptation to processor's performance. Our approach is based on thread pinning and performance detection at execution time. From a parallel program, we define a set of loop nest iterations, forming what is called chunk, and we run it using a first mapping assuming homogeneous cores. Then, performance assessment would correct mapping by speculating the future core's state. The new mapping would be then applied to a new chunk for further evaluation and prediction. The process would stop when the program is fully executed or when judging that chunking is no longer effective.