Liang Qin;Huaxi Gu;Xiaoshan Yu;Zheyi Cai;Junchen Liu
{"title":"兰花通过不频繁的拓扑重组增强高性能计算互连网络","authors":"Liang Qin;Huaxi Gu;Xiaoshan Yu;Zheyi Cai;Junchen Liu","doi":"10.1364/JOCN.516031","DOIUrl":null,"url":null,"abstract":"Interconnection networks are key components of high-performance computing (HPC) systems. As HPC evolves towards the exascale era, providing sufficient bisection bandwidth between computing node pairs through oversubscription in traditional networks becomes prohibitively expensive and impractical. Over the past decade, several architectures leveraging optical circuit switches (OCSs) for dynamic link bandwidth allocation have gained traction. These architectures require frequent network topology reconfiguration to adapt to changing traffic demands. However, practical implementation remains hampered by the long reconfiguration delays inherent in OCS technology. We propose Orchid, an architecture that leverages OCSs to achieve infrequent topology reconfigurations, effectively addressing the problem of long reconfiguration delays. A key innovation of Orchid is its ability to extract stable traffic matrices from historical data. This functionality guides the reconfiguration of the topology without the need for adjustments with each traffic matrix, thereby enabling the sharing of OCS overhead over an extended timeframe. Furthermore, Orchid addresses potential congestion arising from unexpected traffic through the joint design of OCS configuration and routing, ensuring an even distribution of traffic across global links. Extensive experiments using real HPC application traces and synthetic traffic demonstrate that Orchid achieves significant performance improvements compared to existing HPC interconnection networks. Specifically, Orchid reduces packet delay by at least \n<tex>${3} \\times$</tex>\n and enhances throughput by up to 60%.","PeriodicalId":50103,"journal":{"name":"Journal of Optical Communications and Networking","volume":null,"pages":null},"PeriodicalIF":4.0000,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Orchid: enhancing HPC interconnection networks through infrequent topology reconfiguration\",\"authors\":\"Liang Qin;Huaxi Gu;Xiaoshan Yu;Zheyi Cai;Junchen Liu\",\"doi\":\"10.1364/JOCN.516031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Interconnection networks are key components of high-performance computing (HPC) systems. As HPC evolves towards the exascale era, providing sufficient bisection bandwidth between computing node pairs through oversubscription in traditional networks becomes prohibitively expensive and impractical. Over the past decade, several architectures leveraging optical circuit switches (OCSs) for dynamic link bandwidth allocation have gained traction. These architectures require frequent network topology reconfiguration to adapt to changing traffic demands. However, practical implementation remains hampered by the long reconfiguration delays inherent in OCS technology. We propose Orchid, an architecture that leverages OCSs to achieve infrequent topology reconfigurations, effectively addressing the problem of long reconfiguration delays. A key innovation of Orchid is its ability to extract stable traffic matrices from historical data. This functionality guides the reconfiguration of the topology without the need for adjustments with each traffic matrix, thereby enabling the sharing of OCS overhead over an extended timeframe. Furthermore, Orchid addresses potential congestion arising from unexpected traffic through the joint design of OCS configuration and routing, ensuring an even distribution of traffic across global links. Extensive experiments using real HPC application traces and synthetic traffic demonstrate that Orchid achieves significant performance improvements compared to existing HPC interconnection networks. Specifically, Orchid reduces packet delay by at least \\n<tex>${3} \\\\times$</tex>\\n and enhances throughput by up to 60%.\",\"PeriodicalId\":50103,\"journal\":{\"name\":\"Journal of Optical Communications and Networking\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2024-03-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Optical Communications and Networking\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10536144/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Optical Communications and Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10536144/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Orchid: enhancing HPC interconnection networks through infrequent topology reconfiguration
Interconnection networks are key components of high-performance computing (HPC) systems. As HPC evolves towards the exascale era, providing sufficient bisection bandwidth between computing node pairs through oversubscription in traditional networks becomes prohibitively expensive and impractical. Over the past decade, several architectures leveraging optical circuit switches (OCSs) for dynamic link bandwidth allocation have gained traction. These architectures require frequent network topology reconfiguration to adapt to changing traffic demands. However, practical implementation remains hampered by the long reconfiguration delays inherent in OCS technology. We propose Orchid, an architecture that leverages OCSs to achieve infrequent topology reconfigurations, effectively addressing the problem of long reconfiguration delays. A key innovation of Orchid is its ability to extract stable traffic matrices from historical data. This functionality guides the reconfiguration of the topology without the need for adjustments with each traffic matrix, thereby enabling the sharing of OCS overhead over an extended timeframe. Furthermore, Orchid addresses potential congestion arising from unexpected traffic through the joint design of OCS configuration and routing, ensuring an even distribution of traffic across global links. Extensive experiments using real HPC application traces and synthetic traffic demonstrate that Orchid achieves significant performance improvements compared to existing HPC interconnection networks. Specifically, Orchid reduces packet delay by at least
${3} \times$
and enhances throughput by up to 60%.
期刊介绍:
The scope of the Journal includes advances in the state-of-the-art of optical networking science, technology, and engineering. Both theoretical contributions (including new techniques, concepts, analyses, and economic studies) and practical contributions (including optical networking experiments, prototypes, and new applications) are encouraged. Subareas of interest include the architecture and design of optical networks, optical network survivability and security, software-defined optical networking, elastic optical networks, data and control plane advances, network management related innovation, and optical access networks. Enabling technologies and their applications are suitable topics only if the results are shown to directly impact optical networking beyond simple point-to-point networks.