{"title":"现代数据中心工作负载生成方法:来自阿里巴巴的观点","authors":"Yi Liang , Nianyi Ruan , Lan Yi , Xing Su","doi":"10.1016/j.tbench.2024.100164","DOIUrl":null,"url":null,"abstract":"<div><p>Modern data centers provide the foundational infrastructure of cloud computing. Workload generation, which involves simulating or constructing tasks and transactions to replicate the actual resource usage patterns of real-world systems or applications, plays essential role for efficient resource management in these centers. Data center traces, rich in information about workload execution and resource utilization, are thus ideal data for workload generation. Traditional traces provide detailed temporal resource usage data to enable fine-grained workload generation. However, modern data centers tend to favor tracing statistical metrics to reduce overhead. Therefore the accurate reconstruction of temporal resource consumption without detailed, temporized trace information become a major challenge for trace-based workload generation. To address this challenge, we propose STWGEN, a novel method that leverages statistical trace data for workload generation. STWGEN is specifically designed to generate the batch task workloads based on Alibaba trace. STWGEN contains two key components: a suite of C program-based flexible workload building blocks and a heuristic strategy to assemble building blocks for workload generation. Both components are carefully designed to reproduce synthetic batch tasks that closely replicate the observed resource usage patterns in a representative data center. Experimental results demonstrate that STWGEN outperforms state-of-the-art workload generation methods as it emulates workload-level and machine-level resource usage in much higher accuracy.</p></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772485924000164/pdfft?md5=dc97b50be70f18c4e64b66906a378a03&pid=1-s2.0-S2772485924000164-main.pdf","citationCount":"0","resultStr":"{\"title\":\"An approach to workload generation for modern data centers: A view from Alibaba trace\",\"authors\":\"Yi Liang , Nianyi Ruan , Lan Yi , Xing Su\",\"doi\":\"10.1016/j.tbench.2024.100164\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Modern data centers provide the foundational infrastructure of cloud computing. Workload generation, which involves simulating or constructing tasks and transactions to replicate the actual resource usage patterns of real-world systems or applications, plays essential role for efficient resource management in these centers. Data center traces, rich in information about workload execution and resource utilization, are thus ideal data for workload generation. Traditional traces provide detailed temporal resource usage data to enable fine-grained workload generation. However, modern data centers tend to favor tracing statistical metrics to reduce overhead. Therefore the accurate reconstruction of temporal resource consumption without detailed, temporized trace information become a major challenge for trace-based workload generation. To address this challenge, we propose STWGEN, a novel method that leverages statistical trace data for workload generation. STWGEN is specifically designed to generate the batch task workloads based on Alibaba trace. STWGEN contains two key components: a suite of C program-based flexible workload building blocks and a heuristic strategy to assemble building blocks for workload generation. Both components are carefully designed to reproduce synthetic batch tasks that closely replicate the observed resource usage patterns in a representative data center. Experimental results demonstrate that STWGEN outperforms state-of-the-art workload generation methods as it emulates workload-level and machine-level resource usage in much higher accuracy.</p></div>\",\"PeriodicalId\":100155,\"journal\":{\"name\":\"BenchCouncil Transactions on Benchmarks, Standards and Evaluations\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2772485924000164/pdfft?md5=dc97b50be70f18c4e64b66906a378a03&pid=1-s2.0-S2772485924000164-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BenchCouncil Transactions on Benchmarks, Standards and Evaluations\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772485924000164\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772485924000164","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
现代数据中心是云计算的基础架构。工作负载生成涉及模拟或构建任务和事务,以复制现实世界中系统或应用的实际资源使用模式,对这些中心的高效资源管理起着至关重要的作用。因此,数据中心跟踪信息中含有丰富的工作负载执行和资源利用信息,是工作负载生成的理想数据。传统的跟踪可提供详细的时间资源使用数据,从而实现细粒度的工作负载生成。然而,现代数据中心倾向于采用跟踪统计指标来减少开销。因此,在没有详细的时间化跟踪信息的情况下,如何准确重建时间资源消耗成为基于跟踪的工作负载生成所面临的一大挑战。为了应对这一挑战,我们提出了 STWGEN,一种利用统计跟踪数据生成工作负载的新方法。STWGEN 专为生成基于阿里巴巴跟踪的批处理任务工作负载而设计。STWGEN 包含两个关键组件:一套基于 C 程序的灵活工作负载构建模块和一种启发式策略,用于组合构建模块以生成工作负载。这两个组件都经过精心设计,用于重现合成批处理任务,这些任务与在代表性数据中心观察到的资源使用模式密切相关。实验结果表明,STWGEN 超越了最先进的工作负载生成方法,因为它能更准确地模拟工作负载级和机器级资源使用情况。
An approach to workload generation for modern data centers: A view from Alibaba trace
Modern data centers provide the foundational infrastructure of cloud computing. Workload generation, which involves simulating or constructing tasks and transactions to replicate the actual resource usage patterns of real-world systems or applications, plays essential role for efficient resource management in these centers. Data center traces, rich in information about workload execution and resource utilization, are thus ideal data for workload generation. Traditional traces provide detailed temporal resource usage data to enable fine-grained workload generation. However, modern data centers tend to favor tracing statistical metrics to reduce overhead. Therefore the accurate reconstruction of temporal resource consumption without detailed, temporized trace information become a major challenge for trace-based workload generation. To address this challenge, we propose STWGEN, a novel method that leverages statistical trace data for workload generation. STWGEN is specifically designed to generate the batch task workloads based on Alibaba trace. STWGEN contains two key components: a suite of C program-based flexible workload building blocks and a heuristic strategy to assemble building blocks for workload generation. Both components are carefully designed to reproduce synthetic batch tasks that closely replicate the observed resource usage patterns in a representative data center. Experimental results demonstrate that STWGEN outperforms state-of-the-art workload generation methods as it emulates workload-level and machine-level resource usage in much higher accuracy.