Mohammad Norouzi , Nicolas Morew , Qamar Ilias , Lukas Rothenberger , Ali Jannesari , Felix Wolf
{"title":"Fast data-dependence profiling through prior static analysis","authors":"Mohammad Norouzi , Nicolas Morew , Qamar Ilias , Lukas Rothenberger , Ali Jannesari , Felix Wolf","doi":"10.1016/j.parco.2024.103063","DOIUrl":null,"url":null,"abstract":"<div><p>Data-dependence profiling is a program-analysis technique for detecting parallelism opportunities in sequential programs. It captures data dependences that actually occur during program execution, filtering parallelism-preventing dependences that purely static methods assume only because they lack critical runtime information, such as the values of pointers and array indices. Profiling, however, suffers from high runtime overhead. In our earlier work, we accelerated data-dependence profiling by excluding polyhedral loops that can be handled statically using certain compilers and eliminating scalar variables that create statically-identifiable data dependences. In this paper, we combine the two methods and integrate them into DiscoPoP, a data-dependence profiler and parallelism discovery tool. Additionally, we detect reduction patterns statically and unify the three static analyses with the DiscoPoP framework to significantly diminish the profiling overhead and for a wider range of programs. We have evaluated our unified approaches with 49 benchmarks from three benchmark suites and two computer simulation applications. The evaluation results show that our approach reports fewer false positive and negative data dependences than the original data-dependence profiler and reduces the profiling time by at least 43%, with a median reduction of 76% across all programs. Also, we identify 40% of reduction cases statically and eliminate the associated profiling overhead for these cases.</p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"119 ","pages":"Article 103063"},"PeriodicalIF":2.0000,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167819124000012/pdfft?md5=99e5ae1bcda1fac5d3c65fb23d0ba7f8&pid=1-s2.0-S0167819124000012-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Parallel Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167819124000012","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Data-dependence profiling is a program-analysis technique for detecting parallelism opportunities in sequential programs. It captures data dependences that actually occur during program execution, filtering parallelism-preventing dependences that purely static methods assume only because they lack critical runtime information, such as the values of pointers and array indices. Profiling, however, suffers from high runtime overhead. In our earlier work, we accelerated data-dependence profiling by excluding polyhedral loops that can be handled statically using certain compilers and eliminating scalar variables that create statically-identifiable data dependences. In this paper, we combine the two methods and integrate them into DiscoPoP, a data-dependence profiler and parallelism discovery tool. Additionally, we detect reduction patterns statically and unify the three static analyses with the DiscoPoP framework to significantly diminish the profiling overhead and for a wider range of programs. We have evaluated our unified approaches with 49 benchmarks from three benchmark suites and two computer simulation applications. The evaluation results show that our approach reports fewer false positive and negative data dependences than the original data-dependence profiler and reduces the profiling time by at least 43%, with a median reduction of 76% across all programs. Also, we identify 40% of reduction cases statically and eliminate the associated profiling overhead for these cases.
期刊介绍:
Parallel Computing is an international journal presenting the practical use of parallel computer systems, including high performance architecture, system software, programming systems and tools, and applications. Within this context the journal covers all aspects of high-end parallel computing from single homogeneous or heterogenous computing nodes to large-scale multi-node systems.
Parallel Computing features original research work and review articles as well as novel or illustrative accounts of application experience with (and techniques for) the use of parallel computers. We also welcome studies reproducing prior publications that either confirm or disprove prior published results.
Particular technical areas of interest include, but are not limited to:
-System software for parallel computer systems including programming languages (new languages as well as compilation techniques), operating systems (including middleware), and resource management (scheduling and load-balancing).
-Enabling software including debuggers, performance tools, and system and numeric libraries.
-General hardware (architecture) concepts, new technologies enabling the realization of such new concepts, and details of commercially available systems
-Software engineering and productivity as it relates to parallel computing
-Applications (including scientific computing, deep learning, machine learning) or tool case studies demonstrating novel ways to achieve parallelism
-Performance measurement results on state-of-the-art systems
-Approaches to effectively utilize large-scale parallel computing including new algorithms or algorithm analysis with demonstrated relevance to real applications using existing or next generation parallel computer architectures.
-Parallel I/O systems both hardware and software
-Networking technology for support of high-speed computing demonstrating the impact of high-speed computation on parallel applications