{"title":"基于仿真的不同约束条件下高性能计算网络设计的性价比分析","authors":"A. Bhatele, Nikhil Jain, M. Mubarak, T. Gamblin","doi":"10.1145/3316480.3325516","DOIUrl":null,"url":null,"abstract":"Identifying a suitable network topology and deciding its optimal configuration parameters are critical aspects of the overall HPC system design, procurement and installation process. Typically, multiple network topology choices are compared under the balanced injection-to-global bandwidth criterion to identify the best candidate. However, deviating from this balanced criterion may not impact application performance adversely and is often done in practice due to other considerations such as monetary cost. In this paper, we identify different practical constraints that determine the number of nodes, routers, and links, and in turn, influence dollar costs and impact network design. We design network topologies under one or more such constraints which represent different design points (iso-{*} analysis). We then perform a comprehensive, comparative evaluation of three scalable network topologies -- dragonfly, express mesh, and fat-tree -- enabled by parallel discrete-event simulations (PDES) of relevant HPC workloads. We identify network topologies that perform best under different iso-{*} configurations and compare their performance per dollar based on market data.","PeriodicalId":398793,"journal":{"name":"Proceedings of the 2019 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Analyzing Cost-Performance Tradeoffs of HPC Network Designs under Different Constraints using Simulations\",\"authors\":\"A. Bhatele, Nikhil Jain, M. Mubarak, T. Gamblin\",\"doi\":\"10.1145/3316480.3325516\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Identifying a suitable network topology and deciding its optimal configuration parameters are critical aspects of the overall HPC system design, procurement and installation process. Typically, multiple network topology choices are compared under the balanced injection-to-global bandwidth criterion to identify the best candidate. However, deviating from this balanced criterion may not impact application performance adversely and is often done in practice due to other considerations such as monetary cost. In this paper, we identify different practical constraints that determine the number of nodes, routers, and links, and in turn, influence dollar costs and impact network design. We design network topologies under one or more such constraints which represent different design points (iso-{*} analysis). We then perform a comprehensive, comparative evaluation of three scalable network topologies -- dragonfly, express mesh, and fat-tree -- enabled by parallel discrete-event simulations (PDES) of relevant HPC workloads. We identify network topologies that perform best under different iso-{*} configurations and compare their performance per dollar based on market data.\",\"PeriodicalId\":398793,\"journal\":{\"name\":\"Proceedings of the 2019 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2019 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3316480.3325516\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3316480.3325516","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analyzing Cost-Performance Tradeoffs of HPC Network Designs under Different Constraints using Simulations
Identifying a suitable network topology and deciding its optimal configuration parameters are critical aspects of the overall HPC system design, procurement and installation process. Typically, multiple network topology choices are compared under the balanced injection-to-global bandwidth criterion to identify the best candidate. However, deviating from this balanced criterion may not impact application performance adversely and is often done in practice due to other considerations such as monetary cost. In this paper, we identify different practical constraints that determine the number of nodes, routers, and links, and in turn, influence dollar costs and impact network design. We design network topologies under one or more such constraints which represent different design points (iso-{*} analysis). We then perform a comprehensive, comparative evaluation of three scalable network topologies -- dragonfly, express mesh, and fat-tree -- enabled by parallel discrete-event simulations (PDES) of relevant HPC workloads. We identify network topologies that perform best under different iso-{*} configurations and compare their performance per dollar based on market data.