大型图形分析的预测建模和可扩展性分析

2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM) Pub Date : 2017-05-01 DOI:10.23919/INM.2017.7987265

Sourav Medya, L. Cherkasova, Ambuj K. Singh

{"title":"大型图形分析的预测建模和可扩展性分析","authors":"Sourav Medya, L. Cherkasova, Ambuj K. Singh","doi":"10.23919/INM.2017.7987265","DOIUrl":null,"url":null,"abstract":"Many HPC and modern large graph processing applications belong to a class of scale-out applications, where the application dataset is partitioned and processed by a cluster of machines. Assessing the application scalability is one of the primary goals during such application implementation. Typically, in the design phase, programmers are limited by a small size cluster available for their experiments. Therefore, predictive modeling is required for the analysis of the application scalability and its performance in a larger cluster. While in an increased size cluster, each node will process a smaller portion of the original dataset, a higher communication volume between a larger number of nodes may cripple the application scalability and provide diminishing performance benefits. One of the main challenges is the analysis of bandwidth demands due to an increased communication volume in a larger size cluster. In this paper1, we introduce a novel regression-based approach to assess the scalability and performance of a distributed memory program for execution in a large-scale cluster. Our solution involves 1) a limited set of traditional experiments performed in a small size cluster and 2) an additional set of similar experiments performed with an “interconnect bandwidth throttling” tool, which exposes the bandwidth impact on the application performance. These measurements are used in creating an ensemble of analytical models for performance and scalability analysis. Using a linear regression approach, step by step, we incorporate into the model the following important parameters: i) the number of cluster nodes and application processes, ii) the dataset size, and iii) interconnect bandwidth. We demonstrate our solution, its power, and accuracy using a popular Graph500 benchmark, which implements a Breadth First Search algorithm on large, synthetically generated graphs. By utilizing measurements collected in a 32-node cluster, we are able to project the program performance in a large size cluster with hundreds of nodes. The proposed approach and derived models help to provide an early feedback to programmers on the scalability and efficiency of their solution.","PeriodicalId":119633,"journal":{"name":"2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Predictive modeling and scalability analysis for large graph analytics\",\"authors\":\"Sourav Medya, L. Cherkasova, Ambuj K. Singh\",\"doi\":\"10.23919/INM.2017.7987265\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many HPC and modern large graph processing applications belong to a class of scale-out applications, where the application dataset is partitioned and processed by a cluster of machines. Assessing the application scalability is one of the primary goals during such application implementation. Typically, in the design phase, programmers are limited by a small size cluster available for their experiments. Therefore, predictive modeling is required for the analysis of the application scalability and its performance in a larger cluster. While in an increased size cluster, each node will process a smaller portion of the original dataset, a higher communication volume between a larger number of nodes may cripple the application scalability and provide diminishing performance benefits. One of the main challenges is the analysis of bandwidth demands due to an increased communication volume in a larger size cluster. In this paper1, we introduce a novel regression-based approach to assess the scalability and performance of a distributed memory program for execution in a large-scale cluster. Our solution involves 1) a limited set of traditional experiments performed in a small size cluster and 2) an additional set of similar experiments performed with an “interconnect bandwidth throttling” tool, which exposes the bandwidth impact on the application performance. These measurements are used in creating an ensemble of analytical models for performance and scalability analysis. Using a linear regression approach, step by step, we incorporate into the model the following important parameters: i) the number of cluster nodes and application processes, ii) the dataset size, and iii) interconnect bandwidth. We demonstrate our solution, its power, and accuracy using a popular Graph500 benchmark, which implements a Breadth First Search algorithm on large, synthetically generated graphs. By utilizing measurements collected in a 32-node cluster, we are able to project the program performance in a large size cluster with hundreds of nodes. The proposed approach and derived models help to provide an early feedback to programmers on the scalability and efficiency of their solution.\",\"PeriodicalId\":119633,\"journal\":{\"name\":\"2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM)\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/INM.2017.7987265\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/INM.2017.7987265","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

许多HPC和现代大型图形处理应用程序属于横向扩展应用程序的一类，其中应用程序数据集由机器集群进行分区和处理。评估应用程序的可伸缩性是此类应用程序实现期间的主要目标之一。通常，在设计阶段，程序员受限于可用于实验的小型集群。因此，在更大的集群中，需要对应用程序的可伸缩性及其性能进行预测建模分析。虽然在规模增加的集群中，每个节点将处理原始数据集的较小部分，但大量节点之间较高的通信量可能会削弱应用程序的可伸缩性，并减少性能优势。其中一个主要挑战是分析由于更大规模集群中通信量增加而导致的带宽需求。在本文中，我们介绍了一种新的基于回归的方法来评估分布式内存程序在大规模集群中执行的可伸缩性和性能。我们的解决方案包括:1)在小型集群中执行一组有限的传统实验;2)使用“互连带宽节流”工具执行一组额外的类似实验，该工具暴露了带宽对应用程序性能的影响。这些度量用于创建用于性能和可伸缩性分析的分析模型集合。使用线性回归方法，我们一步一步地将以下重要参数纳入模型:i)集群节点和应用程序进程的数量，ii)数据集大小，以及iii)互连带宽。我们使用流行的Graph500基准来演示我们的解决方案，它的功能和准确性，该基准在大型合成图上实现了广度优先搜索算法。通过利用在32个节点集群中收集的测量数据，我们能够在具有数百个节点的大型集群中预测程序性能。建议的方法和派生的模型有助于向程序员提供关于其解决方案的可伸缩性和效率的早期反馈。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Predictive modeling and scalability analysis for large graph analytics

Many HPC and modern large graph processing applications belong to a class of scale-out applications, where the application dataset is partitioned and processed by a cluster of machines. Assessing the application scalability is one of the primary goals during such application implementation. Typically, in the design phase, programmers are limited by a small size cluster available for their experiments. Therefore, predictive modeling is required for the analysis of the application scalability and its performance in a larger cluster. While in an increased size cluster, each node will process a smaller portion of the original dataset, a higher communication volume between a larger number of nodes may cripple the application scalability and provide diminishing performance benefits. One of the main challenges is the analysis of bandwidth demands due to an increased communication volume in a larger size cluster. In this paper1, we introduce a novel regression-based approach to assess the scalability and performance of a distributed memory program for execution in a large-scale cluster. Our solution involves 1) a limited set of traditional experiments performed in a small size cluster and 2) an additional set of similar experiments performed with an “interconnect bandwidth throttling” tool, which exposes the bandwidth impact on the application performance. These measurements are used in creating an ensemble of analytical models for performance and scalability analysis. Using a linear regression approach, step by step, we incorporate into the model the following important parameters: i) the number of cluster nodes and application processes, ii) the dataset size, and iii) interconnect bandwidth. We demonstrate our solution, its power, and accuracy using a popular Graph500 benchmark, which implements a Breadth First Search algorithm on large, synthetically generated graphs. By utilizing measurements collected in a 32-node cluster, we are able to project the program performance in a large size cluster with hundreds of nodes. The proposed approach and derived models help to provide an early feedback to programmers on the scalability and efficiency of their solution.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM)

自引率

0.00%

发文量