{"title":"SROdcn: Scalable and Reconfigurable Optical DCN Architecture for High-Performance Computing","authors":"Kassahun Geresu;Huaxi Gu;Xiaoshan Yu;Meaad Fadhel;Hui Tian;Wenting Wei","doi":"10.1109/TCC.2024.3523433","DOIUrl":null,"url":null,"abstract":"Data Center Network (DCN) flexibility is critical for providing adaptive and dynamic bandwidth while optimizing network resources to manage variable traffic patterns generated by heterogeneous applications. To provide flexible bandwidth, this work proposes a machine learning approach with a new Scalable and Reconfigurable Optical DCN (SROdcn) architecture that maintains dynamic and non-uniform network traffic according to the scale of the high-performance optical interconnected DCN. Our main device is the Fiber Optical Switch (FOS), which offers competitive wavelength resolution. We propose a new top-of-rack (ToR) switch that utilizes Wavelength Selective Switches (WSS) to investigate Software-Defined Networking (SDN) with machine learning-enabled flow prediction for reconfigurable optical Data Center Networks (DCNs). Our architecture provides highly scalable and flexible bandwidth allocation. Results from Mininet experimental simulations demonstrate that under the management of an SDN controller, machine learning traffic flow prediction and graph connectivity allow each optical bandwidth to be automatically reconfigured according to variable traffic patterns. The average server-to-server packet delay performance of the reconfigurable SROdcn improves by 42.33% compared to inflexible interconnects. Furthermore, the network performance of flexible SROdcn servers shows up to a 49.67% latency improvement over the Passive Optical Data Center Architecture (PODCA), a 16.87% latency improvement over the optical OPSquare DCN, and up to a 71.13% latency improvement over the fat-tree network. Additionally, our optimized Unsupervised Machine Learning (ML-UnS) method for SROdcn outperforms Supervised Machine Learning (ML-S) and Deep Learning (DL).","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"13 1","pages":"245-258"},"PeriodicalIF":5.3000,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cloud Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10816648/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Data Center Network (DCN) flexibility is critical for providing adaptive and dynamic bandwidth while optimizing network resources to manage variable traffic patterns generated by heterogeneous applications. To provide flexible bandwidth, this work proposes a machine learning approach with a new Scalable and Reconfigurable Optical DCN (SROdcn) architecture that maintains dynamic and non-uniform network traffic according to the scale of the high-performance optical interconnected DCN. Our main device is the Fiber Optical Switch (FOS), which offers competitive wavelength resolution. We propose a new top-of-rack (ToR) switch that utilizes Wavelength Selective Switches (WSS) to investigate Software-Defined Networking (SDN) with machine learning-enabled flow prediction for reconfigurable optical Data Center Networks (DCNs). Our architecture provides highly scalable and flexible bandwidth allocation. Results from Mininet experimental simulations demonstrate that under the management of an SDN controller, machine learning traffic flow prediction and graph connectivity allow each optical bandwidth to be automatically reconfigured according to variable traffic patterns. The average server-to-server packet delay performance of the reconfigurable SROdcn improves by 42.33% compared to inflexible interconnects. Furthermore, the network performance of flexible SROdcn servers shows up to a 49.67% latency improvement over the Passive Optical Data Center Architecture (PODCA), a 16.87% latency improvement over the optical OPSquare DCN, and up to a 71.13% latency improvement over the fat-tree network. Additionally, our optimized Unsupervised Machine Learning (ML-UnS) method for SROdcn outperforms Supervised Machine Learning (ML-S) and Deep Learning (DL).
期刊介绍:
The IEEE Transactions on Cloud Computing (TCC) is dedicated to the multidisciplinary field of cloud computing. It is committed to the publication of articles that present innovative research ideas, application results, and case studies in cloud computing, focusing on key technical issues related to theory, algorithms, systems, applications, and performance.