V. Morozov, Jiayuan Meng, V. Vishwanath, J. Hammond, Kalyan Kumaran, M. Papka
{"title":"ALCF MPI Benchmarks: Understanding Machine-Specific Communication Behavior","authors":"V. Morozov, Jiayuan Meng, V. Vishwanath, J. Hammond, Kalyan Kumaran, M. Papka","doi":"10.1109/ICPPW.2012.7","DOIUrl":null,"url":null,"abstract":"As systems grow larger and computation is further spread across nodes, efficient data communication is becoming increasingly important to achieve high throughput and low power consumption for high performance computing systems. However, communication efficacy not only depends on application-specific communication patterns, but also on machine-specific communication subsystems, node architectures, and even the runtime communication libraries. In fact, different hardware systems lead to different tradeoffs with respect to communication mechanisms, which can impact the choice of application implementations. We present a set of MPI-based benchmarks to better understand the communication behavior of the hardware systems and guide the performance tuning of scientific applications. We further apply these benchmarks to three clusters and present several interesting lessons from our experience.","PeriodicalId":412234,"journal":{"name":"2012 41st International Conference on Parallel Processing Workshops","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 41st International Conference on Parallel Processing Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPPW.2012.7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
As systems grow larger and computation is further spread across nodes, efficient data communication is becoming increasingly important to achieve high throughput and low power consumption for high performance computing systems. However, communication efficacy not only depends on application-specific communication patterns, but also on machine-specific communication subsystems, node architectures, and even the runtime communication libraries. In fact, different hardware systems lead to different tradeoffs with respect to communication mechanisms, which can impact the choice of application implementations. We present a set of MPI-based benchmarks to better understand the communication behavior of the hardware systems and guide the performance tuning of scientific applications. We further apply these benchmarks to three clusters and present several interesting lessons from our experience.