C. Christofi, G. Michael, P. Trancoso, P. Evripidou
{"title":"Exploring HPC Parallelism with Data-Driven Multithreating","authors":"C. Christofi, G. Michael, P. Trancoso, P. Evripidou","doi":"10.1109/DFM.2012.11","DOIUrl":null,"url":null,"abstract":"The switch to Multi-core systems has ended the reliance on the single processor for increase in performance and moved into Parallelism. However, the exponential growth in performance of the single processor in the 80's and 90's had overshadowed the drive for efficient Parallelism and relegate it into a niche research area, mostly for High Performance Computing (HPC). Parallelism now is in the forefront and holds the burden for utilising the extra resources of Moore's law to maintain the exponential growth of the computing systems. In the drive to utilise parallel models of computation, Data-Flow models have recently been \"re-visited\" for exploiting parallelism in the multi and many core systems. Data-Driven Multithreading (DDM) is one such model which is based on Dynamic Data- Flow principles, that can expose the maximum parallelism of an application. DDM schedules Threads based on Data availability driven by a producer consumer graph. DDM enforces single assignments semantics on the data passed from producer to consumer. In this paper we present a preliminary evaluation of whether DDM can be viable candidate for HPC. We study the scalability of a small subset of the LINPACK benchmark using the Data-Driven Multithreading for a system with a 48 cores. We implement three test case operations: Matrix Multiplication, LU and Cholesky decompositions and use them to test their scalability and performance. We use optimized linear algebra kernel operation for the basic operations performed in the threads. We compare our DDM implementations against PLASMA, a state-of-the art linear algebra library for HPC computing, and show that applications using the DDM model can scale efficiently and observe a performance improvement of up to 2×.","PeriodicalId":130397,"journal":{"name":"2012 Data-Flow Execution Models for Extreme Scale Computing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Data-Flow Execution Models for Extreme Scale Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DFM.2012.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
The switch to Multi-core systems has ended the reliance on the single processor for increase in performance and moved into Parallelism. However, the exponential growth in performance of the single processor in the 80's and 90's had overshadowed the drive for efficient Parallelism and relegate it into a niche research area, mostly for High Performance Computing (HPC). Parallelism now is in the forefront and holds the burden for utilising the extra resources of Moore's law to maintain the exponential growth of the computing systems. In the drive to utilise parallel models of computation, Data-Flow models have recently been "re-visited" for exploiting parallelism in the multi and many core systems. Data-Driven Multithreading (DDM) is one such model which is based on Dynamic Data- Flow principles, that can expose the maximum parallelism of an application. DDM schedules Threads based on Data availability driven by a producer consumer graph. DDM enforces single assignments semantics on the data passed from producer to consumer. In this paper we present a preliminary evaluation of whether DDM can be viable candidate for HPC. We study the scalability of a small subset of the LINPACK benchmark using the Data-Driven Multithreading for a system with a 48 cores. We implement three test case operations: Matrix Multiplication, LU and Cholesky decompositions and use them to test their scalability and performance. We use optimized linear algebra kernel operation for the basic operations performed in the threads. We compare our DDM implementations against PLASMA, a state-of-the art linear algebra library for HPC computing, and show that applications using the DDM model can scale efficiently and observe a performance improvement of up to 2×.