并行机器上快速连接组件算法的性能建模研究

Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI:10.1145/224170.224275

S. Lumetta, A. Krishnamurthy, D. Culler

{"title":"并行机器上快速连接组件算法的性能建模研究","authors":"S. Lumetta, A. Krishnamurthy, D. Culler","doi":"10.1145/224170.224275","DOIUrl":null,"url":null,"abstract":"We present and analyze a portable, high-performance algorithm for finding connected components on modern distributed memory multiprocessors. The algorithm is a hybrid of the classic DFS on the subgraph local to each processor and a variant of the Shiloach-Vishkin PRAM algorithm on the global collection of subgraphs. We implement the algorithm in Split-C and measure performance on the the Cray T3D, the Meiko CS-2, and the Thinking Machines CM-5 using a class of graphs derived from cluster dynamics methods in computational physics. On a 256 processor Cray T3D, the implementation outperforms all previous solutions by an order of magnitude. A characterization of graph parameters allows us to select graphs that highlight key performance features. We study the effects of these parameters and machine characteristics on the balance of time between the local and global phases of the algorithm and find that edge density, surface-to-volume ratio, and relative communication cost dominate performance. By understanding the effect of machine characteristics on performance, the study sheds light on the impact of improvements in computational and/or communication performance on this challenging problem.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines\",\"authors\":\"S. Lumetta, A. Krishnamurthy, D. Culler\",\"doi\":\"10.1145/224170.224275\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present and analyze a portable, high-performance algorithm for finding connected components on modern distributed memory multiprocessors. The algorithm is a hybrid of the classic DFS on the subgraph local to each processor and a variant of the Shiloach-Vishkin PRAM algorithm on the global collection of subgraphs. We implement the algorithm in Split-C and measure performance on the the Cray T3D, the Meiko CS-2, and the Thinking Machines CM-5 using a class of graphs derived from cluster dynamics methods in computational physics. On a 256 processor Cray T3D, the implementation outperforms all previous solutions by an order of magnitude. A characterization of graph parameters allows us to select graphs that highlight key performance features. We study the effects of these parameters and machine characteristics on the balance of time between the local and global phases of the algorithm and find that edge density, surface-to-volume ratio, and relative communication cost dominate performance. By understanding the effect of machine characteristics on performance, the study sheds light on the impact of improvements in computational and/or communication performance on this challenging problem.\",\"PeriodicalId\":269909,\"journal\":{\"name\":\"Proceedings of the IEEE/ACM SC95 Conference\",\"volume\":\"85 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1995-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the IEEE/ACM SC95 Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/224170.224275\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the IEEE/ACM SC95 Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/224170.224275","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

摘要

我们提出并分析了一种可移植的高性能算法，用于在现代分布式内存多处理器上查找连接组件。该算法是每个处理器本地子图上的经典DFS算法和子图全局集合上的Shiloach-Vishkin PRAM算法的一种变体的混合。我们在Split-C中实现了该算法，并在Cray T3D、Meiko CS-2和Thinking Machines CM-5上使用从计算物理中的聚类动力学方法派生的一类图来测量性能。在256处理器的Cray T3D上，该实现比以前的所有解决方案都要好一个数量级。图形参数的特性使我们能够选择突出显示关键性能特征的图形。我们研究了这些参数和机器特性对算法局部和全局阶段之间时间平衡的影响，发现边缘密度、面体积比和相对通信成本主导了性能。通过了解机器特性对性能的影响，该研究揭示了计算和/或通信性能改进对这个具有挑战性的问题的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines

We present and analyze a portable, high-performance algorithm for finding connected components on modern distributed memory multiprocessors. The algorithm is a hybrid of the classic DFS on the subgraph local to each processor and a variant of the Shiloach-Vishkin PRAM algorithm on the global collection of subgraphs. We implement the algorithm in Split-C and measure performance on the the Cray T3D, the Meiko CS-2, and the Thinking Machines CM-5 using a class of graphs derived from cluster dynamics methods in computational physics. On a 256 processor Cray T3D, the implementation outperforms all previous solutions by an order of magnitude. A characterization of graph parameters allows us to select graphs that highlight key performance features. We study the effects of these parameters and machine characteristics on the balance of time between the local and global phases of the algorithm and find that edge density, surface-to-volume ratio, and relative communication cost dominate performance. By understanding the effect of machine characteristics on performance, the study sheds light on the impact of improvements in computational and/or communication performance on this challenging problem.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the IEEE/ACM SC95 Conference

自引率

0.00%

发文量