快速最小生成树在GPU上的大图形

Proceedings of the Conference on High Performance Graphics 2009 Pub Date : 2009-08-01 DOI:10.1145/1572769.1572796

Vibhav Vineet, P. Harish, Suryakant Patidar, P J Narayanan

{"title":"快速最小生成树在GPU上的大图形","authors":"Vibhav Vineet, P. Harish, Suryakant Patidar, P J Narayanan","doi":"10.1145/1572769.1572796","DOIUrl":null,"url":null,"abstract":"Graphics Processor Units are used for many general purpose processing due to high compute power available on them. Regular, data-parallel algorithms map well to the SIMD architecture of current GPU. Irregular algorithms on discrete structures like graphs are harder to map to them. Efficient data-mapping primitives can play crucial role in mapping such algorithms onto the GPU. In this paper, we present a minimum spanning tree algorithm on Nvidia GPUs under CUDA, as a recursive formulation of Borůvka's approach for undirected graphs. We implement it using scalable primitives such as scan, segmented scan and split. The irregular steps of supervertex formation and recursive graph construction are mapped to primitives like split to categories involving vertex ids and edge weights. We obtain 30 to 50 times speedup over the CPU implementation on most graphs and 3 to 10 times speedup over our previous GPU implementation. We construct the minimum spanning tree on a 5 million node and 30 million edge graph in under 1 second on one quarter of the Tesla S1070 GPU.","PeriodicalId":163044,"journal":{"name":"Proceedings of the Conference on High Performance Graphics 2009","volume":"347 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"126","resultStr":"{\"title\":\"Fast minimum spanning tree for large graphs on the GPU\",\"authors\":\"Vibhav Vineet, P. Harish, Suryakant Patidar, P J Narayanan\",\"doi\":\"10.1145/1572769.1572796\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graphics Processor Units are used for many general purpose processing due to high compute power available on them. Regular, data-parallel algorithms map well to the SIMD architecture of current GPU. Irregular algorithms on discrete structures like graphs are harder to map to them. Efficient data-mapping primitives can play crucial role in mapping such algorithms onto the GPU. In this paper, we present a minimum spanning tree algorithm on Nvidia GPUs under CUDA, as a recursive formulation of Borůvka's approach for undirected graphs. We implement it using scalable primitives such as scan, segmented scan and split. The irregular steps of supervertex formation and recursive graph construction are mapped to primitives like split to categories involving vertex ids and edge weights. We obtain 30 to 50 times speedup over the CPU implementation on most graphs and 3 to 10 times speedup over our previous GPU implementation. We construct the minimum spanning tree on a 5 million node and 30 million edge graph in under 1 second on one quarter of the Tesla S1070 GPU.\",\"PeriodicalId\":163044,\"journal\":{\"name\":\"Proceedings of the Conference on High Performance Graphics 2009\",\"volume\":\"347 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"126\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Conference on High Performance Graphics 2009\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1572769.1572796\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Conference on High Performance Graphics 2009","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1572769.1572796","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 126

摘要

图形处理器单元用于许多通用处理，因为它们具有很高的计算能力。常规的数据并行算法很好地映射到当前GPU的SIMD架构。像图这样的离散结构上的不规则算法很难映射到它们。高效的数据映射原语可以在将这种算法映射到GPU上发挥关键作用。在本文中，我们提出了在CUDA下的Nvidia gpu上的最小生成树算法，作为Borůvka无向图方法的递归公式。我们使用扫描、分段扫描和分割等可扩展的原语来实现它。超顶点形成和递归图构造的不规则步骤被映射到像分割这样的基元，涉及顶点id和边权的类别。在大多数图形上，我们比CPU实现获得30到50倍的加速，比以前的GPU实现获得3到10倍的加速。我们在1 / 4的Tesla S1070 GPU上构建了500万个节点和3000万个边的最小生成树，用时不到1秒。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Fast minimum spanning tree for large graphs on the GPU

Graphics Processor Units are used for many general purpose processing due to high compute power available on them. Regular, data-parallel algorithms map well to the SIMD architecture of current GPU. Irregular algorithms on discrete structures like graphs are harder to map to them. Efficient data-mapping primitives can play crucial role in mapping such algorithms onto the GPU. In this paper, we present a minimum spanning tree algorithm on Nvidia GPUs under CUDA, as a recursive formulation of Borůvka's approach for undirected graphs. We implement it using scalable primitives such as scan, segmented scan and split. The irregular steps of supervertex formation and recursive graph construction are mapped to primitives like split to categories involving vertex ids and edge weights. We obtain 30 to 50 times speedup over the CPU implementation on most graphs and 3 to 10 times speedup over our previous GPU implementation. We construct the minimum spanning tree on a 5 million node and 30 million edge graph in under 1 second on one quarter of the Tesla S1070 GPU.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Conference on High Performance Graphics 2009

自引率

0.00%

发文量