Ziyuan Liu;Zhixiong Niu;Ran Shu;Wenxue Cheng;Lihua Yuan;Jacob Nelson;Dan R. K. Ports;Peng Cheng;Yongqiang Xiong
{"title":"HyperDrive: Direct Network Telemetry Storage via Programmable Switches","authors":"Ziyuan Liu;Zhixiong Niu;Ran Shu;Wenxue Cheng;Lihua Yuan;Jacob Nelson;Dan R. K. Ports;Peng Cheng;Yongqiang Xiong","doi":"10.1109/TCC.2025.3543477","DOIUrl":null,"url":null,"abstract":"In cloud datacenter operations, telemetry and logs are indispensable, enabling essential services such as network diagnostics, auditing, and knowledge discovery. The escalating scale of data centers, coupled with increased bandwidth and finer-grained telemetry, results in an overwhelming volume of data. This proliferation poses significant storage challenges for telemetry systems. In this article, we introduce HyperDrive, an innovative system designed to efficiently store large volumes of telemetry and logs in data centers using programmable switches. This in-network approach effectively mitigates bandwidth bottlenecks commonly associated with traditional endpoint-based methods. To our knowledge, we are the first to use a programmable switch to directly control storage, bypassing the CPU to achieve the best performance. With merely 21% of a switch’s resources, our HyperDrive implementation showcases remarkable scalability and efficiency. Through rigorous evaluation, it has demonstrated linear scaling capabilities, efficiently managing 12 SSDs on a single server with minimal host overhead. In an eight-server testbed, HyperDrive achieved an impressive throughput of approximately 730 Gbps, underscoring its potential to transform data center telemetry and logging practices.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"13 2","pages":"498-511"},"PeriodicalIF":5.3000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cloud Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10892081/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In cloud datacenter operations, telemetry and logs are indispensable, enabling essential services such as network diagnostics, auditing, and knowledge discovery. The escalating scale of data centers, coupled with increased bandwidth and finer-grained telemetry, results in an overwhelming volume of data. This proliferation poses significant storage challenges for telemetry systems. In this article, we introduce HyperDrive, an innovative system designed to efficiently store large volumes of telemetry and logs in data centers using programmable switches. This in-network approach effectively mitigates bandwidth bottlenecks commonly associated with traditional endpoint-based methods. To our knowledge, we are the first to use a programmable switch to directly control storage, bypassing the CPU to achieve the best performance. With merely 21% of a switch’s resources, our HyperDrive implementation showcases remarkable scalability and efficiency. Through rigorous evaluation, it has demonstrated linear scaling capabilities, efficiently managing 12 SSDs on a single server with minimal host overhead. In an eight-server testbed, HyperDrive achieved an impressive throughput of approximately 730 Gbps, underscoring its potential to transform data center telemetry and logging practices.
期刊介绍:
The IEEE Transactions on Cloud Computing (TCC) is dedicated to the multidisciplinary field of cloud computing. It is committed to the publication of articles that present innovative research ideas, application results, and case studies in cloud computing, focusing on key technical issues related to theory, algorithms, systems, applications, and performance.