Danlin Jia, Yiming Xie, Li Wang, Xiaoqian Zhang, Allen Yang, Xuebin Yao, Mahsa Bayati, Pradeep Subedi, B. Sheng, N. Mi
{"title":"SRC: Mitigate I/O Throughput Degradation in Network Congestion Control of Disaggregated Storage Systems","authors":"Danlin Jia, Yiming Xie, Li Wang, Xiaoqian Zhang, Allen Yang, Xuebin Yao, Mahsa Bayati, Pradeep Subedi, B. Sheng, N. Mi","doi":"10.1109/IPDPS54959.2023.00035","DOIUrl":null,"url":null,"abstract":"The industry has adopted disaggregated storage systems to provide high-quality services for hyper-scale architectures. This infrastructure enables organizations to access storage resources that can be independently managed, configured, and scaled. It is supported by the recent advances of all-flash arrays and NVMe-over-Fabric protocol, enabling remote access to NVMe devices over different network fabrics. A surge of research has been proposed to mitigate network congestion in traditional remote direct memory access protocol (RDMA). However, NVMe-oF raises new challenges in congestion control for disaggregated storage systems.In this work, we investigate the performance degradation of the read throughput on storage nodes caused by traditional network congestion control mechanisms. We design a storage-side rate control (SRC) to relieve network congestion while avoiding performance degradation on storage nodes. First, we design an I/O throughput control mechanism in the NVMe driver layer to enable throughput control on storage nodes. Second, we construct a throughput prediction model to learn a mapping function between workload characteristics and I/O throughput. Third, we deploy SRC on storage nodes to cooperate with traditional network congestion control on an NVMe-over-RDMA architecture. Finally, we evaluate SRC with varying workloads, SSD configurations, and network topologies. The experimental results show that SRC achieves significant performance improvement.","PeriodicalId":343684,"journal":{"name":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS54959.2023.00035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The industry has adopted disaggregated storage systems to provide high-quality services for hyper-scale architectures. This infrastructure enables organizations to access storage resources that can be independently managed, configured, and scaled. It is supported by the recent advances of all-flash arrays and NVMe-over-Fabric protocol, enabling remote access to NVMe devices over different network fabrics. A surge of research has been proposed to mitigate network congestion in traditional remote direct memory access protocol (RDMA). However, NVMe-oF raises new challenges in congestion control for disaggregated storage systems.In this work, we investigate the performance degradation of the read throughput on storage nodes caused by traditional network congestion control mechanisms. We design a storage-side rate control (SRC) to relieve network congestion while avoiding performance degradation on storage nodes. First, we design an I/O throughput control mechanism in the NVMe driver layer to enable throughput control on storage nodes. Second, we construct a throughput prediction model to learn a mapping function between workload characteristics and I/O throughput. Third, we deploy SRC on storage nodes to cooperate with traditional network congestion control on an NVMe-over-RDMA architecture. Finally, we evaluate SRC with varying workloads, SSD configurations, and network topologies. The experimental results show that SRC achieves significant performance improvement.