{"title":"一种高效的基于事务的最小生成森林算法GPU实现","authors":"Shayan Manoochehri, B. Goodarzi, D. Goswami","doi":"10.1109/HPCS.2017.100","DOIUrl":null,"url":null,"abstract":"General Purpose GPUs (GPGPUs) are ideal platforms for parallel execution of applications with regular shared memory access patterns. However, majority of real world multithreaded applications require access to shared memory with irregular patterns. The Minimum Spanning Forest (MSF) calculation arises in many real world applications. The Boruvka's algorithm for calculating MSF has the most expressed parallelism; however, it is a challenging irregular algorithm to implement on GPUs. In this paper we show that a transaction- based design and implementation of the Boruvka's algorithm on GPU can handle some of the challenges arising due to irregularity. First, we identify the hotspots of the algorithm that are the main bottlenecks: edge discovery and merge. The edge discovery phase is implemented using lock-free synchronizations after extracting certain algebraic properties (e.g. monotonicity) of the computation. The merge phase, however, lacks such algebraic properties and hence we utilize a Software Transactional Memory (STM) based synchronization method. STM offers ease of use by guaranteeing deadlock/livelock-free behavior as opposed to blocking lock-based synchronization. It also increases programmability by providing high level abstractions for synchronization which facilitate a natural transition from algorithm design to implementation. In addition, we employ several optimization techniques in different phases of the algorithm to achieve load balance and enhanced GPU resource utilization. Experimental results show that our GPU-based implementation outperforms both the fastest sequential implementation and the existing STM-based implementation on multicore CPUs when tested on large-scale graphs with diverse densities.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"An Efficient Transaction-Based GPU Implementation of Minimum Spanning Forest Algorithm\",\"authors\":\"Shayan Manoochehri, B. Goodarzi, D. Goswami\",\"doi\":\"10.1109/HPCS.2017.100\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"General Purpose GPUs (GPGPUs) are ideal platforms for parallel execution of applications with regular shared memory access patterns. However, majority of real world multithreaded applications require access to shared memory with irregular patterns. The Minimum Spanning Forest (MSF) calculation arises in many real world applications. The Boruvka's algorithm for calculating MSF has the most expressed parallelism; however, it is a challenging irregular algorithm to implement on GPUs. In this paper we show that a transaction- based design and implementation of the Boruvka's algorithm on GPU can handle some of the challenges arising due to irregularity. First, we identify the hotspots of the algorithm that are the main bottlenecks: edge discovery and merge. The edge discovery phase is implemented using lock-free synchronizations after extracting certain algebraic properties (e.g. monotonicity) of the computation. The merge phase, however, lacks such algebraic properties and hence we utilize a Software Transactional Memory (STM) based synchronization method. STM offers ease of use by guaranteeing deadlock/livelock-free behavior as opposed to blocking lock-based synchronization. It also increases programmability by providing high level abstractions for synchronization which facilitate a natural transition from algorithm design to implementation. In addition, we employ several optimization techniques in different phases of the algorithm to achieve load balance and enhanced GPU resource utilization. Experimental results show that our GPU-based implementation outperforms both the fastest sequential implementation and the existing STM-based implementation on multicore CPUs when tested on large-scale graphs with diverse densities.\",\"PeriodicalId\":115758,\"journal\":{\"name\":\"2017 International Conference on High Performance Computing & Simulation (HPCS)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on High Performance Computing & Simulation (HPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCS.2017.100\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCS.2017.100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Efficient Transaction-Based GPU Implementation of Minimum Spanning Forest Algorithm
General Purpose GPUs (GPGPUs) are ideal platforms for parallel execution of applications with regular shared memory access patterns. However, majority of real world multithreaded applications require access to shared memory with irregular patterns. The Minimum Spanning Forest (MSF) calculation arises in many real world applications. The Boruvka's algorithm for calculating MSF has the most expressed parallelism; however, it is a challenging irregular algorithm to implement on GPUs. In this paper we show that a transaction- based design and implementation of the Boruvka's algorithm on GPU can handle some of the challenges arising due to irregularity. First, we identify the hotspots of the algorithm that are the main bottlenecks: edge discovery and merge. The edge discovery phase is implemented using lock-free synchronizations after extracting certain algebraic properties (e.g. monotonicity) of the computation. The merge phase, however, lacks such algebraic properties and hence we utilize a Software Transactional Memory (STM) based synchronization method. STM offers ease of use by guaranteeing deadlock/livelock-free behavior as opposed to blocking lock-based synchronization. It also increases programmability by providing high level abstractions for synchronization which facilitate a natural transition from algorithm design to implementation. In addition, we employ several optimization techniques in different phases of the algorithm to achieve load balance and enhanced GPU resource utilization. Experimental results show that our GPU-based implementation outperforms both the fastest sequential implementation and the existing STM-based implementation on multicore CPUs when tested on large-scale graphs with diverse densities.