{"title":"A flexible sparse matrix data format and parallel algorithms for the assembly of finite element matrices on shared memory systems","authors":"Adam Sky , César Polindara , Ingo Muench , Carolin Birk","doi":"10.1016/j.parco.2023.103039","DOIUrl":"https://doi.org/10.1016/j.parco.2023.103039","url":null,"abstract":"<div><p><span>Finite element methods<span><span> require the composition of the global stiffness matrix from local finite element contributions. The composition process combines the computation of </span>element stiffness matrices<span> and their assembly into the global stiffness matrix, which is commonly sparse. In this paper we focus on the assembly process of the global stiffness matrix and explore different algorithms and their efficiency on shared memory systems using C</span></span></span><span>++</span><span>. A key aspect of our investigation is the use of atomic synchronization primitives for the derivation of data-race free algorithms and data structures. Furthermore, we propose a new flexible storage format for sparse matrices and compare its performance with the compressed row storage format using abstract benchmarks based on common characteristics of finite element problems.</span></p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"117 ","pages":"Article 103039"},"PeriodicalIF":1.4,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49877861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinliang Shi , Dewu Chen , Jiabi Liang , Lin Li , Yue Lin , Jianjiang Li
{"title":"New YARN sharing GPU based on graphics memory granularity scheduling","authors":"Jinliang Shi , Dewu Chen , Jiabi Liang , Lin Li , Yue Lin , Jianjiang Li","doi":"10.1016/j.parco.2023.103038","DOIUrl":"https://doi.org/10.1016/j.parco.2023.103038","url":null,"abstract":"<div><p>As one of the most widely used cluster scheduling frameworks, Hadoop<span> YARN only supported CPU and memory scheduling in the past. Furthermore, due to the widespread use of AI<span>, the demand for GPU<span> is also increasing. So Hadoop YARN V3.0 adds GPU scheduling, but the granularity<span> is on the whole card yet, rather than finer-grained graphics memory scheduling. However, during daily training, although the graphics memory required by tasks may be much smaller than the whole GPU card, they will occupy the whole card, which results in wasted resources. To address this issue, Tensorflow provides the API for graphics memory control. Therefore, we propose to introduce this feature into Hadoop YARN so that it can support the heterogeneous scheduling: CPU, memory and graphics memory. Then we take HadoopV2.7 source code as the underlying architecture and design a new scheduler GSHARE. Compared with previous scheduling strategies, with 3 nodes, 3 GPU cards per node, and 12G graphics memory per card, GSHARE improves efficiency by up to 74% for Tensorflow tasks with 2G of graphics memory. Meanwhile, it minimizes the problem of wasted graphics memory caused by the inability to control graphics memory proportionally by the API of Tensorflow for multiple-card.</span></span></span></span></p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"117 ","pages":"Article 103038"},"PeriodicalIF":1.4,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49877862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rene Halver , Christoph Junghans , Godehard Sutmann
{"title":"Using heterogeneous GPU nodes with a Cabana-based implementation of MPCD","authors":"Rene Halver , Christoph Junghans , Godehard Sutmann","doi":"10.1016/j.parco.2023.103033","DOIUrl":"https://doi.org/10.1016/j.parco.2023.103033","url":null,"abstract":"<div><p><span>The Kokkos based library Cabana, which has been developed in the Co-design Center for Particle Applications (CoPA), is used for the implementation of Multi-Particle Collision Dynamics<span> (MPCD), a particle-based description of hydrodynamic interactions. Cabana allows for a function portable implementation, which has been used to study the </span></span>interplay<span> between CPU<span> and GPU usage on a multi-node system as well as analysis of said interplay with performance analysis tools. As a result, we see most advantages in a homogeneous GPU usage, but we also discuss the extent to which heterogeneous applications might be more performant, using both CPU and GPU concurrently.</span></span></p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"117 ","pages":"Article 103033"},"PeriodicalIF":1.4,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49877857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Editorial on Advances in High Performance Programming","authors":"Ami Marowka , Przemysław Stpiczyński","doi":"10.1016/j.parco.2023.103037","DOIUrl":"https://doi.org/10.1016/j.parco.2023.103037","url":null,"abstract":"","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"117 ","pages":"Article 103037"},"PeriodicalIF":1.4,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49877858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bin Yu , Xu Lu , Cong Tian , Meng Wang , Chu Chen , Ming Lei , Zhenhua Duan
{"title":"Adaptively parallel runtime verification based on distributed network for temporal properties","authors":"Bin Yu , Xu Lu , Cong Tian , Meng Wang , Chu Chen , Ming Lei , Zhenhua Duan","doi":"10.1016/j.parco.2023.103034","DOIUrl":"https://doi.org/10.1016/j.parco.2023.103034","url":null,"abstract":"<div><p>Runtime verification<span><span> is a lightweight verification technique that verifies whether a monitored program execution satisfies a desired property. Online runtime verification faces challenges regarding efficiency and property expressiveness, which limit its widespread adoption. However, there is a lack of research that addresses both of these issues. With the basis of a distributed network, we propose an adaptively parallel approach to verify full regular temporal properties of C programs in an online manner. During program execution, segments of the generated state sequence are verified by distributed machines concurrently, while each segment is also verified in each multi-core machine with an adaptive number of </span>threads. Experimental results demonstrate that, with supporting more expressive properties, our approach has a speedup of 2.5X–5.0X compared with other runtime verification approaches.</span></p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"117 ","pages":"Article 103034"},"PeriodicalIF":1.4,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49877448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using heterogeneous GPU nodes with a Cabana-based implementation of MPCD","authors":"R. Halver, Christoph Junghans, G. Sutmann","doi":"10.1016/j.parco.2023.103033","DOIUrl":"https://doi.org/10.1016/j.parco.2023.103033","url":null,"abstract":"","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"117 1","pages":"103033"},"PeriodicalIF":1.4,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"55107193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Srđan Daniel Simić, Nikola Tanković, Darko Etinger
{"title":"Big data BPMN workflow resource optimization in the cloud","authors":"Srđan Daniel Simić, Nikola Tanković, Darko Etinger","doi":"10.1016/j.parco.2023.103025","DOIUrl":"https://doi.org/10.1016/j.parco.2023.103025","url":null,"abstract":"<div><p>Cloud computing is one of the critical technologies that meet the demand of various businesses for the high-capacity computational processing power needed to gain knowledge from their ever-growing business data. When utilizing cloud computing resources to deal with Big Data processing, companies face the challenge of determining the optimal use of resources within their business processes. The miscalculation of the necessary resources directly affects their budget and can cause delays in the cycle time of their key processes. This study investigates the simulation of cloud resource optimization for Big Data workflows modeled with the Business Process Modeling Notation (BPMN). To this end, a BPMN performance evaluation framework was developed. The framework’s capabilities were presented using real-world data science workflow and later evaluated on workflows consisting of 13, 52, and 104 tasks. The results show that the developed framework is adequate for estimating the overall run-time distribution and optimizing the cloud resource deployment and that the BPMN can be utilized for Big Data processing workflows. Therefore, this study contributes to BPMN practitioners by providing a tool to apply BPMN for their Big Data workflows and decision-makers by giving them critical insights into their key business processes. The framework source code is available at <span>https://github.com/ntankovic/python-bpmn-engine</span><svg><path></path></svg>.</p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"117 ","pages":"Article 103025"},"PeriodicalIF":1.4,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49877447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finding inputs that trigger floating-point exceptions in heterogeneous computing via Bayesian optimization","authors":"I. Laguna, Anh Tran, G. Gopalakrishnan","doi":"10.1016/j.parco.2023.103042","DOIUrl":"https://doi.org/10.1016/j.parco.2023.103042","url":null,"abstract":"","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"62 1","pages":"103042"},"PeriodicalIF":1.4,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"55107870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A flexible sparse matrix data format and parallel algorithms for the assembly of finite element matrices on shared memory systems","authors":"A. Sky, César Polindara, I. Muench, C. Birk","doi":"10.1016/j.parco.2023.103039","DOIUrl":"https://doi.org/10.1016/j.parco.2023.103039","url":null,"abstract":"","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"117 1","pages":"103039"},"PeriodicalIF":1.4,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"55107767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}