NAS并行基准的体系结构要求和可扩展性

ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI:10.1145/331532.331573

Frederick C. Wong, R. Martin, Remzi H. Arpaci-Dusseau, D. Culler

{"title":"NAS并行基准的体系结构要求和可扩展性","authors":"Frederick C. Wong, R. Martin, Remzi H. Arpaci-Dusseau, D. Culler","doi":"10.1145/331532.331573","DOIUrl":null,"url":null,"abstract":"We present a study of the architectural requirements and scalability of the NAS Parallel Benchmarks. Through direct measurements and simulations, we identify the factors which affect the scalability of benchmark codes on two relevant and distinct platforms; a cluster of workstations and a ccNUMA SGI Origin 2000. We find that the benefit of increased global cache size is pronounced in certain applications and often offsets the communication cost. By constructing the working set profile of the benchmarks, we are able to visualize the improvement of computational efficiency under constant-problem-size scaling. We also find that, while the Origin MPI has better point-to-point performance, the cluster MPI layer is more scalable with communication load. However, communication performance within the applications is often much lower than what would be achieved by micro-benchmarks. We show that the communication protocols used by MPI runtime library are influential to the communication performance in applications, and that the benchmark codes have a wide spectrum of communication requirements.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"129","resultStr":"{\"title\":\"Architectural Requirements and Scalability of the NAS Parallel Benchmarks\",\"authors\":\"Frederick C. Wong, R. Martin, Remzi H. Arpaci-Dusseau, D. Culler\",\"doi\":\"10.1145/331532.331573\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a study of the architectural requirements and scalability of the NAS Parallel Benchmarks. Through direct measurements and simulations, we identify the factors which affect the scalability of benchmark codes on two relevant and distinct platforms; a cluster of workstations and a ccNUMA SGI Origin 2000. We find that the benefit of increased global cache size is pronounced in certain applications and often offsets the communication cost. By constructing the working set profile of the benchmarks, we are able to visualize the improvement of computational efficiency under constant-problem-size scaling. We also find that, while the Origin MPI has better point-to-point performance, the cluster MPI layer is more scalable with communication load. However, communication performance within the applications is often much lower than what would be achieved by micro-benchmarks. We show that the communication protocols used by MPI runtime library are influential to the communication performance in applications, and that the benchmark codes have a wide spectrum of communication requirements.\",\"PeriodicalId\":354898,\"journal\":{\"name\":\"ACM/IEEE SC 1999 Conference (SC'99)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"129\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM/IEEE SC 1999 Conference (SC'99)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/331532.331573\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM/IEEE SC 1999 Conference (SC'99)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/331532.331573","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 129

摘要

我们对NAS并行基准的体系结构需求和可扩展性进行了研究。通过直接测量和仿真，我们确定了在两个相关且不同的平台上影响基准代码可扩展性的因素;一个工作站集群和一个ccNUMA SGI Origin 2000。我们发现，增加全局缓存大小的好处在某些应用程序中是明显的，并且通常抵消了通信成本。通过构建基准测试的工作集概要，我们能够直观地看到在恒定问题大小缩放下计算效率的提高。我们还发现，虽然Origin MPI具有更好的点对点性能，但集群MPI层具有更好的通信负载可扩展性。然而，应用程序内的通信性能通常比微基准测试所能达到的性能要低得多。研究表明，MPI运行库所使用的通信协议对应用程序中的通信性能有很大影响，并且基准代码具有广泛的通信需求。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Architectural Requirements and Scalability of the NAS Parallel Benchmarks

We present a study of the architectural requirements and scalability of the NAS Parallel Benchmarks. Through direct measurements and simulations, we identify the factors which affect the scalability of benchmark codes on two relevant and distinct platforms; a cluster of workstations and a ccNUMA SGI Origin 2000. We find that the benefit of increased global cache size is pronounced in certain applications and often offsets the communication cost. By constructing the working set profile of the benchmarks, we are able to visualize the improvement of computational efficiency under constant-problem-size scaling. We also find that, while the Origin MPI has better point-to-point performance, the cluster MPI layer is more scalable with communication load. However, communication performance within the applications is often much lower than what would be achieved by micro-benchmarks. We show that the communication protocols used by MPI runtime library are influential to the communication performance in applications, and that the benchmark codes have a wide spectrum of communication requirements.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM/IEEE SC 1999 Conference (SC'99)

自引率

0.00%

发文量