Juan Carlos Saez, Fernando Castro, Manuel Prieto-Matias
{"title":"Enabling performance portability of data-parallel OpenMP applications on asymmetric multicore processors","authors":"Juan Carlos Saez, Fernando Castro, Manuel Prieto-Matias","doi":"arxiv-2402.07664","DOIUrl":null,"url":null,"abstract":"Asymmetric multicore processors (AMPs) couple high-performance big cores and\nlow-power small cores with the same instruction-set architecture but different\nfeatures, such as clock frequency or microarchitecture. Previous work has shown\nthat asymmetric designs may deliver higher energy efficiency than symmetric\nmulticores for diverse workloads. Despite their benefits, AMPs pose significant\nchallenges to runtime systems of parallel programming models. While previous\nwork has mainly explored how to efficiently execute task-based parallel\napplications on AMPs, via enhancements in the runtime system, improving the\nperformance of unmodified data-parallel applications on these architectures is\nstill a big challenge. In this work we analyze the particular case of\nloop-based OpenMP applications, which are widely used today in scientific and\nengineering domains, and constitute the dominant application type in many\nparallel benchmark suites used for performance evaluation on multicore systems.\nWe observed that conventional loop-scheduling OpenMP approaches are unable to\nefficiently cope with the load imbalance that naturally stems from the\ndifferent performance delivered by big and small cores. To address this shortcoming, we propose \\textit{Asymmetric Iteration\nDistribution} (AID), a set of novel loop-scheduling methods for AMPs that\ndistribute iterations unevenly across worker threads to efficiently deal with\nperformance asymmetry. We implemented AID in \\textit{libgomp} --the GNU OpenMP\nruntime system--, and evaluated it on two different asymmetric multicore\nplatforms. Our analysis reveals that the AID methods constitute effective\nreplacements of the \\texttt{static} and \\texttt{dynamic} methods on AMPs, and\nare capable of improving performance over these conventional strategies by up\nto 56\\% and 16.8\\%, respectively.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Operating Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2402.07664","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Asymmetric multicore processors (AMPs) couple high-performance big cores and
low-power small cores with the same instruction-set architecture but different
features, such as clock frequency or microarchitecture. Previous work has shown
that asymmetric designs may deliver higher energy efficiency than symmetric
multicores for diverse workloads. Despite their benefits, AMPs pose significant
challenges to runtime systems of parallel programming models. While previous
work has mainly explored how to efficiently execute task-based parallel
applications on AMPs, via enhancements in the runtime system, improving the
performance of unmodified data-parallel applications on these architectures is
still a big challenge. In this work we analyze the particular case of
loop-based OpenMP applications, which are widely used today in scientific and
engineering domains, and constitute the dominant application type in many
parallel benchmark suites used for performance evaluation on multicore systems.
We observed that conventional loop-scheduling OpenMP approaches are unable to
efficiently cope with the load imbalance that naturally stems from the
different performance delivered by big and small cores. To address this shortcoming, we propose \textit{Asymmetric Iteration
Distribution} (AID), a set of novel loop-scheduling methods for AMPs that
distribute iterations unevenly across worker threads to efficiently deal with
performance asymmetry. We implemented AID in \textit{libgomp} --the GNU OpenMP
runtime system--, and evaluated it on two different asymmetric multicore
platforms. Our analysis reveals that the AID methods constitute effective
replacements of the \texttt{static} and \texttt{dynamic} methods on AMPs, and
are capable of improving performance over these conventional strategies by up
to 56\% and 16.8\%, respectively.