{"title":"A systematic process for efficient execution on Intel's heterogeneous computation nodes","authors":"A. Rane, J. Browne, L. Koesterke","doi":"10.1145/2335755.2335797","DOIUrl":null,"url":null,"abstract":"Heterogeneous architectures (mainstream CPUs with accelerators/co-processors) are expected to become more prevalent in high performance computing clusters. This paper deals specifically with attaining efficient execution on nodes which combine Intel's multicore Sandy Bridge chips with MIC manycore chips. The architecture and software stack for Intel's heterogeneous computation nodes attempt to make migration from the now common multicore chips to the many-core chips straightforward. However, specific execution characteristics are favored by these manycore chips such as making use of the wider vector instructions, minimal inter-thread conflicts, etc. Additionally manycore chips have lower clock speed and no unified last-level cache. As a result, and as we demonstrate in this paper, it will commonly be the case that not all parts of an application will execute more efficiently on the manycore chip than on the multicore chip. This paper presents a process, based on measurements of execution on Westmere-based multicore chips, which can accurately predict which code segments will execute efficiently on the manycore chips and illustrates and evaluates its application to three substantial full programs -- HOMME, MOIL and MILC. The effectiveness of the process is validated by verifying scalability of the specific functions and loops that were recommended for MIC execution on a Knights Ferry computation node.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"7 1","pages":"8:1-8:8"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2335755.2335797","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Heterogeneous architectures (mainstream CPUs with accelerators/co-processors) are expected to become more prevalent in high performance computing clusters. This paper deals specifically with attaining efficient execution on nodes which combine Intel's multicore Sandy Bridge chips with MIC manycore chips. The architecture and software stack for Intel's heterogeneous computation nodes attempt to make migration from the now common multicore chips to the many-core chips straightforward. However, specific execution characteristics are favored by these manycore chips such as making use of the wider vector instructions, minimal inter-thread conflicts, etc. Additionally manycore chips have lower clock speed and no unified last-level cache. As a result, and as we demonstrate in this paper, it will commonly be the case that not all parts of an application will execute more efficiently on the manycore chip than on the multicore chip. This paper presents a process, based on measurements of execution on Westmere-based multicore chips, which can accurately predict which code segments will execute efficiently on the manycore chips and illustrates and evaluates its application to three substantial full programs -- HOMME, MOIL and MILC. The effectiveness of the process is validated by verifying scalability of the specific functions and loops that were recommended for MIC execution on a Knights Ferry computation node.