基于异构多核动态性能预测的块并行化

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI:10.1109/HPCS.2017.28

A. Dab, Y. Slama

{"title":"基于异构多核动态性能预测的块并行化","authors":"A. Dab, Y. Slama","doi":"10.1109/HPCS.2017.28","DOIUrl":null,"url":null,"abstract":"Multicore machines are becoming more and more common. Ideally, all applications benefit from these advances in computer architecture. A complex challenge in parallel computing is cores load balancing to minimize the overall execution time called Make span of the parallel program. As multicores may have different architectures, an effective mapping should support this unknown variation to avoid drawbacks on make span. In fact, mapping or static load balancing method may not be effective when the target state machine changes during program execution. Thread affinity has appeared as an important technique to improve the program performance and for better performance stability. In this context, we propose a predictive approach using iterations chunking at runtime allowing parallel code adaptation to processor's performance. Our approach is based on thread pinning and performance detection at execution time. From a parallel program, we define a set of loop nest iterations, forming what is called chunk, and we run it using a first mapping assuming homogeneous cores. Then, performance assessment would correct mapping by speculating the future core's state. The new mapping would be then applied to a new chunk for further evaluation and prediction. The process would stop when the program is fully executed or when judging that chunking is no longer effective.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"45 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Chunk-Wise Parallelization Based on Dynamic Performance Prediction on Heterogeneous Multicores\",\"authors\":\"A. Dab, Y. Slama\",\"doi\":\"10.1109/HPCS.2017.28\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multicore machines are becoming more and more common. Ideally, all applications benefit from these advances in computer architecture. A complex challenge in parallel computing is cores load balancing to minimize the overall execution time called Make span of the parallel program. As multicores may have different architectures, an effective mapping should support this unknown variation to avoid drawbacks on make span. In fact, mapping or static load balancing method may not be effective when the target state machine changes during program execution. Thread affinity has appeared as an important technique to improve the program performance and for better performance stability. In this context, we propose a predictive approach using iterations chunking at runtime allowing parallel code adaptation to processor's performance. Our approach is based on thread pinning and performance detection at execution time. From a parallel program, we define a set of loop nest iterations, forming what is called chunk, and we run it using a first mapping assuming homogeneous cores. Then, performance assessment would correct mapping by speculating the future core's state. The new mapping would be then applied to a new chunk for further evaluation and prediction. The process would stop when the program is fully executed or when judging that chunking is no longer effective.\",\"PeriodicalId\":115758,\"journal\":{\"name\":\"2017 International Conference on High Performance Computing & Simulation (HPCS)\",\"volume\":\"45 3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on High Performance Computing & Simulation (HPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCS.2017.28\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCS.2017.28","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

多核机器正变得越来越普遍。理想情况下，所有应用程序都受益于计算机体系结构的这些进步。并行计算中的一个复杂挑战是核心负载平衡，以最小化并行程序的总体执行时间(称为Make span)。由于多核可能具有不同的体系结构，因此有效的映射应该支持这种未知的变化，以避免在make span上出现缺陷。实际上，当目标状态机在程序执行过程中发生变化时，映射或静态负载平衡方法可能无效。线程关联已经成为提高程序性能和提高性能稳定性的一种重要技术。在这种情况下，我们提出了一种预测方法，在运行时使用迭代分块，允许并行代码适应处理器的性能。我们的方法基于线程固定和执行时的性能检测。从一个并行程序中，我们定义了一组循环巢迭代，形成了所谓的块，我们使用假设同质核的第一个映射来运行它。然后，性能评估将通过推测未来核心的状态来纠正映射。然后将新的映射应用到新的块上，以进行进一步的评估和预测。当程序完全执行或判断分块不再有效时，该进程将停止。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Chunk-Wise Parallelization Based on Dynamic Performance Prediction on Heterogeneous Multicores

Multicore machines are becoming more and more common. Ideally, all applications benefit from these advances in computer architecture. A complex challenge in parallel computing is cores load balancing to minimize the overall execution time called Make span of the parallel program. As multicores may have different architectures, an effective mapping should support this unknown variation to avoid drawbacks on make span. In fact, mapping or static load balancing method may not be effective when the target state machine changes during program execution. Thread affinity has appeared as an important technique to improve the program performance and for better performance stability. In this context, we propose a predictive approach using iterations chunking at runtime allowing parallel code adaptation to processor's performance. Our approach is based on thread pinning and performance detection at execution time. From a parallel program, we define a set of loop nest iterations, forming what is called chunk, and we run it using a first mapping assuming homogeneous cores. Then, performance assessment would correct mapping by speculating the future core's state. The new mapping would be then applied to a new chunk for further evaluation and prediction. The process would stop when the program is fully executed or when judging that chunking is no longer effective.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 International Conference on High Performance Computing & Simulation (HPCS)

自引率

0.00%

发文量