{"title":"Software-Based Buffering of Associative Operations on Random Memory Addresses","authors":"Matthias Hauck, M. Paradies, H. Fröning","doi":"10.1109/IPDPS.2019.00102","DOIUrl":null,"url":null,"abstract":"An important concept for indivisible updates in parallel computing are atomic operations. For most architectures, they also provide ordering guarantees, which in practice can hurt performance. For associative and commutative updates, in this paper we present software buffering techniques that overcome the problem of ordering by combining multiple updates in a temporary buffer and by prefetching addresses before updating them. As a result, our buffering techniques reduce contention and avoid unnecessary ordering constraints, in order to increase the amount of memory parallelism. We evaluate our techniques in different scenarios, including applications like histogram and graph computations, and reason about the applicability for standard systems and multi-socket systems.","PeriodicalId":403406,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2019.00102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
An important concept for indivisible updates in parallel computing are atomic operations. For most architectures, they also provide ordering guarantees, which in practice can hurt performance. For associative and commutative updates, in this paper we present software buffering techniques that overcome the problem of ordering by combining multiple updates in a temporary buffer and by prefetching addresses before updating them. As a result, our buffering techniques reduce contention and avoid unnecessary ordering constraints, in order to increase the amount of memory parallelism. We evaluate our techniques in different scenarios, including applications like histogram and graph computations, and reason about the applicability for standard systems and multi-socket systems.