gpu上的并行顶点覆盖算法

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2022-04-21 DOI:10.48550/arXiv.2204.10402

Peter Yamout, Karim Barada, Adnan Jaljuli, A. E. Mouawad, I. E. Hajj

{"title":"gpu上的并行顶点覆盖算法","authors":"Peter Yamout, Karim Barada, Adnan Jaljuli, A. E. Mouawad, I. E. Hajj","doi":"10.48550/arXiv.2204.10402","DOIUrl":null,"url":null,"abstract":"Finding small vertex covers in a graph has applications in numerous domains such as scheduling, computational biology, telecommunication networks, artificial intelligence, social science, and many more. Two common formulations of the problem include: Minimum Vertex Cover (MVC), which finds the smallest vertex cover in a graph, and Parameterized Vertex Cover (PVC), which finds a vertex cover whose size is less than or equal to some parameter $k$. Algorithms for both formulations involve traversing a search tree, which grows exponentially with the size of the graph or the value of $k$. Parallelizing the traversal of the vertex cover search tree on GPUs is challenging for multiple reasons. First, the search tree is a narrow binary tree which makes it difficult to extract enough sub-trees to process in parallel to fully utilize the GPU's massively parallel execution resources. Second, the search tree is highly imbalanced which makes load balancing across a massive number of parallel GPU workers especially challenging. Third, keeping around all the intermediate state needed to traverse many sub-trees in parallel puts high pressure on the GPU's memory resources and may act as a limiting factor to parallelism. To address these challenges, we propose an approach to traverse the vertex cover search tree in parallel using GPUs while handling dynamic load balancing. Each thread block traverses a different sub-tree using a local stack, however, we use a global worklist to balance the load to ensure that all blocks remain busy. Blocks contribute branches of their sub-trees to the global worklist on an as-needed basis, while blocks that finish their sub-trees pick up new ones from the global worklist. We use degree arrays to represent intermediate graphs so that the representation is compact in memory to avoid limiting parallelism, but self-contained which is necessary for the load balancing process. Our evaluation shows that compared to approaches used in prior work, our hybrid approach of using local stacks and a global worklist substantially improves performance and reduces load imbalance, especially on difficult instances of the problem. Our implementations have been open sourced to enable further research on parallel solutions to the vertex cover problem and other similar problems involving parallel traversal of narrow and highly imbalanced search trees.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Parallel Vertex Cover Algorithms on GPUs\",\"authors\":\"Peter Yamout, Karim Barada, Adnan Jaljuli, A. E. Mouawad, I. E. Hajj\",\"doi\":\"10.48550/arXiv.2204.10402\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Finding small vertex covers in a graph has applications in numerous domains such as scheduling, computational biology, telecommunication networks, artificial intelligence, social science, and many more. Two common formulations of the problem include: Minimum Vertex Cover (MVC), which finds the smallest vertex cover in a graph, and Parameterized Vertex Cover (PVC), which finds a vertex cover whose size is less than or equal to some parameter $k$. Algorithms for both formulations involve traversing a search tree, which grows exponentially with the size of the graph or the value of $k$. Parallelizing the traversal of the vertex cover search tree on GPUs is challenging for multiple reasons. First, the search tree is a narrow binary tree which makes it difficult to extract enough sub-trees to process in parallel to fully utilize the GPU's massively parallel execution resources. Second, the search tree is highly imbalanced which makes load balancing across a massive number of parallel GPU workers especially challenging. Third, keeping around all the intermediate state needed to traverse many sub-trees in parallel puts high pressure on the GPU's memory resources and may act as a limiting factor to parallelism. To address these challenges, we propose an approach to traverse the vertex cover search tree in parallel using GPUs while handling dynamic load balancing. Each thread block traverses a different sub-tree using a local stack, however, we use a global worklist to balance the load to ensure that all blocks remain busy. Blocks contribute branches of their sub-trees to the global worklist on an as-needed basis, while blocks that finish their sub-trees pick up new ones from the global worklist. We use degree arrays to represent intermediate graphs so that the representation is compact in memory to avoid limiting parallelism, but self-contained which is necessary for the load balancing process. Our evaluation shows that compared to approaches used in prior work, our hybrid approach of using local stacks and a global worklist substantially improves performance and reduces load imbalance, especially on difficult instances of the problem. Our implementations have been open sourced to enable further research on parallel solutions to the vertex cover problem and other similar problems involving parallel traversal of narrow and highly imbalanced search trees.\",\"PeriodicalId\":321801,\"journal\":{\"name\":\"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2204.10402\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2204.10402","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

在图中寻找小顶点覆盖在许多领域都有应用，比如调度、计算生物学、电信网络、人工智能、社会科学等等。这个问题的两个常见公式包括:最小顶点覆盖(MVC)，它找到图中最小的顶点覆盖，以及参数化顶点覆盖(PVC)，它找到大小小于或等于某个参数k的顶点覆盖。这两种公式的算法都涉及遍历搜索树，它随着图的大小或k的值呈指数增长。由于多种原因，在gpu上并行化顶点覆盖搜索树的遍历是具有挑战性的。首先，搜索树是一个窄二叉树，这使得很难提取足够的子树来并行处理，以充分利用GPU的大规模并行执行资源。其次，搜索树高度不平衡，这使得在大量并行GPU工作人员之间进行负载平衡尤其具有挑战性。第三，保持并行遍历许多子树所需的所有中间状态会给GPU的内存资源带来很大压力，并可能成为并行性的限制因素。为了解决这些挑战，我们提出了一种使用gpu并行遍历顶点覆盖搜索树的方法，同时处理动态负载平衡。每个线程块使用本地堆栈遍历不同的子树，但是，我们使用全局工作列表来平衡负载，以确保所有块保持忙碌。块根据需要将其子树的分支贡献给全局工作列表，而完成其子树的块从全局工作列表中拾取新分支。我们使用度数组来表示中间图，这样表示在内存中是紧凑的，以避免限制并行性，但自包含是负载平衡过程所必需的。我们的评估表明，与之前工作中使用的方法相比，我们使用本地堆栈和全局工作列表的混合方法大大提高了性能并减少了负载不平衡，特别是在困难的问题实例上。我们的实现已经开源，以便进一步研究顶点覆盖问题和其他类似问题的并行解决方案，这些问题涉及窄且高度不平衡的搜索树的并行遍历。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Parallel Vertex Cover Algorithms on GPUs

Finding small vertex covers in a graph has applications in numerous domains such as scheduling, computational biology, telecommunication networks, artificial intelligence, social science, and many more. Two common formulations of the problem include: Minimum Vertex Cover (MVC), which finds the smallest vertex cover in a graph, and Parameterized Vertex Cover (PVC), which finds a vertex cover whose size is less than or equal to some parameter $k$. Algorithms for both formulations involve traversing a search tree, which grows exponentially with the size of the graph or the value of $k$. Parallelizing the traversal of the vertex cover search tree on GPUs is challenging for multiple reasons. First, the search tree is a narrow binary tree which makes it difficult to extract enough sub-trees to process in parallel to fully utilize the GPU's massively parallel execution resources. Second, the search tree is highly imbalanced which makes load balancing across a massive number of parallel GPU workers especially challenging. Third, keeping around all the intermediate state needed to traverse many sub-trees in parallel puts high pressure on the GPU's memory resources and may act as a limiting factor to parallelism. To address these challenges, we propose an approach to traverse the vertex cover search tree in parallel using GPUs while handling dynamic load balancing. Each thread block traverses a different sub-tree using a local stack, however, we use a global worklist to balance the load to ensure that all blocks remain busy. Blocks contribute branches of their sub-trees to the global worklist on an as-needed basis, while blocks that finish their sub-trees pick up new ones from the global worklist. We use degree arrays to represent intermediate graphs so that the representation is compact in memory to avoid limiting parallelism, but self-contained which is necessary for the load balancing process. Our evaluation shows that compared to approaches used in prior work, our hybrid approach of using local stacks and a global worklist substantially improves performance and reduces load imbalance, especially on difficult instances of the problem. Our implementations have been open sourced to enable further research on parallel solutions to the vertex cover problem and other similar problems involving parallel traversal of narrow and highly imbalanced search trees.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量