{"title":"Run-time recognition of task parallelism within the P++ parallel array class library","authors":"R. Parsons, D. Quinlan","doi":"10.1109/SPLC.1993.365580","DOIUrl":null,"url":null,"abstract":"This paper explores the use of a run-time system to recognize task parallelism within a C++ array class library. Run-time systems currently support data parallelism in P++, FORTRAN 90 D, and High Performance FORTRAN. But data parallelism is insufficient for many applications, including adaptive mesh refinement. Without access to both data and task parallelism such applications exhibit several orders of magnitude more message passing and poor performance. In this paper, a C++ array class library is used to implement deferred evaluation and run-time dependence for task parallelism recognition, to obtain task parallelism through a data flow interpretation of data parallel array statements. Performance results show that the analysis and optimizations are both efficient and practical, allowing us to consider more substantial optimizations.<<ETX>>","PeriodicalId":146277,"journal":{"name":"Proceedings of Scalable Parallel Libraries Conference","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1993-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of Scalable Parallel Libraries Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPLC.1993.365580","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 32
Abstract
This paper explores the use of a run-time system to recognize task parallelism within a C++ array class library. Run-time systems currently support data parallelism in P++, FORTRAN 90 D, and High Performance FORTRAN. But data parallelism is insufficient for many applications, including adaptive mesh refinement. Without access to both data and task parallelism such applications exhibit several orders of magnitude more message passing and poor performance. In this paper, a C++ array class library is used to implement deferred evaluation and run-time dependence for task parallelism recognition, to obtain task parallelism through a data flow interpretation of data parallel array statements. Performance results show that the analysis and optimizations are both efficient and practical, allowing us to consider more substantial optimizations.<>