Rubén Titos-Gil, Oscar Palomar, O. Unsal, A. Cristal
{"title":"DiMP: Architectural Support for Direct Message Passing on Shared Memory Multi-cores","authors":"Rubén Titos-Gil, Oscar Palomar, O. Unsal, A. Cristal","doi":"10.1109/ICPP.2015.22","DOIUrl":null,"url":null,"abstract":"Thanks to programming approaches like actor-based models, message passing is regaining popularity outside large-scale scientific computing for building scalable distributed applications in many-core processors. Unfortunately, the mismatch between message passing models and today's shared-memory hardware provided by commercial vendors results in suboptimal performance and loss of efficiency. This paper presents a set of architectural extensions to reduce the overheads incurred by message passing workloads running on shared memory multi-core architectures. It describes the instruction set extensions and the hardware implementation. In order to facilitate programmability, the proposed extensions are used by a message passing library, allowing programs to take advantage of them transparently. As a proof-of-concept, we use a modified MPICH library and MPI programs to evaluate the proposal. Experimental results show that, on average, our proposal spends 60% less cycles performing data transfers in MPI functions, and reduces the L1 data cache misses in said functions to a fourth.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 44th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2015.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Thanks to programming approaches like actor-based models, message passing is regaining popularity outside large-scale scientific computing for building scalable distributed applications in many-core processors. Unfortunately, the mismatch between message passing models and today's shared-memory hardware provided by commercial vendors results in suboptimal performance and loss of efficiency. This paper presents a set of architectural extensions to reduce the overheads incurred by message passing workloads running on shared memory multi-core architectures. It describes the instruction set extensions and the hardware implementation. In order to facilitate programmability, the proposed extensions are used by a message passing library, allowing programs to take advantage of them transparently. As a proof-of-concept, we use a modified MPICH library and MPI programs to evaluate the proposal. Experimental results show that, on average, our proposal spends 60% less cycles performing data transfers in MPI functions, and reduces the L1 data cache misses in said functions to a fourth.