虚拟化环境下数据密集型工作负载的自适应性能建模

ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS) Pub Date : 2021-03-09 DOI:10.1145/3442696

Hosein Mohammadi Makrani, H. Sayadi, Najmeh Nazari, Sai Manoj Pudukotai Dinakarrao, Avesta Sasan, T. Mohsenin, S. Rafatirad, H. Homayoun

{"title":"虚拟化环境下数据密集型工作负载的自适应性能建模","authors":"Hosein Mohammadi Makrani, H. Sayadi, Najmeh Nazari, Sai Manoj Pudukotai Dinakarrao, Avesta Sasan, T. Mohsenin, S. Rafatirad, H. Homayoun","doi":"10.1145/3442696","DOIUrl":null,"url":null,"abstract":"The processing of data-intensive workloads is a challenging and time-consuming task that often requires massive infrastructure to ensure fast data analysis. The cloud platform is the most popular and powerful scale-out infrastructure to perform big data analytics and eliminate the need to maintain expensive and high-end computing resources at the user side. The performance and the cost of such infrastructure depend on the overall server configuration, such as processor, memory, network, and storage configurations. In addition to the cost of owning or maintaining the hardware, the heterogeneity in the server configuration further expands the selection space, leading to non-convergence. The challenge is further exacerbated by the dependency of the application’s performance on the underlying hardware. Despite an increasing interest in resource provisioning, few works have been done to develop accurate and practical models to proactively predict the performance of data-intensive applications corresponding to the server configuration and provision a cost-optimal configuration online. In this work, through a comprehensive real-system empirical analysis of performance, we address these challenges by introducing ProMLB: a proactive machine-learning-based methodology for resource provisioning. We first characterize diverse types of data-intensive workloads across different types of server architectures. The characterization aids in accurately capture applications’ behavior and train a model for prediction of their performance. Then, ProMLB builds a set of cross-platform performance models for each application. Based on the developed predictive model, ProMLB uses an optimization technique to distinguish close-to-optimal configuration to minimize the product of execution time and cost. Compared to the oracle scheduler, ProMLB achieves 91% accuracy in terms of application-resource matching. On average, ProMLB improves the performance and resource utilization by 42.6% and 41.1%, respectively, compared to baseline scheduler. Moreover, ProMLB improves the performance per cost by 2.5× on average.","PeriodicalId":105474,"journal":{"name":"ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Adaptive Performance Modeling of Data-intensive Workloads for Resource Provisioning in Virtualized Environment\",\"authors\":\"Hosein Mohammadi Makrani, H. Sayadi, Najmeh Nazari, Sai Manoj Pudukotai Dinakarrao, Avesta Sasan, T. Mohsenin, S. Rafatirad, H. Homayoun\",\"doi\":\"10.1145/3442696\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The processing of data-intensive workloads is a challenging and time-consuming task that often requires massive infrastructure to ensure fast data analysis. The cloud platform is the most popular and powerful scale-out infrastructure to perform big data analytics and eliminate the need to maintain expensive and high-end computing resources at the user side. The performance and the cost of such infrastructure depend on the overall server configuration, such as processor, memory, network, and storage configurations. In addition to the cost of owning or maintaining the hardware, the heterogeneity in the server configuration further expands the selection space, leading to non-convergence. The challenge is further exacerbated by the dependency of the application’s performance on the underlying hardware. Despite an increasing interest in resource provisioning, few works have been done to develop accurate and practical models to proactively predict the performance of data-intensive applications corresponding to the server configuration and provision a cost-optimal configuration online. In this work, through a comprehensive real-system empirical analysis of performance, we address these challenges by introducing ProMLB: a proactive machine-learning-based methodology for resource provisioning. We first characterize diverse types of data-intensive workloads across different types of server architectures. The characterization aids in accurately capture applications’ behavior and train a model for prediction of their performance. Then, ProMLB builds a set of cross-platform performance models for each application. Based on the developed predictive model, ProMLB uses an optimization technique to distinguish close-to-optimal configuration to minimize the product of execution time and cost. Compared to the oracle scheduler, ProMLB achieves 91% accuracy in terms of application-resource matching. On average, ProMLB improves the performance and resource utilization by 42.6% and 41.1%, respectively, compared to baseline scheduler. Moreover, ProMLB improves the performance per cost by 2.5× on average.\",\"PeriodicalId\":105474,\"journal\":{\"name\":\"ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-03-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3442696\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3442696","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

处理数据密集型工作负载是一项具有挑战性且耗时的任务，通常需要大量基础设施来确保快速的数据分析。云平台是执行大数据分析和消除在用户端维护昂贵和高端计算资源的需要的最流行和最强大的横向扩展基础设施。这种基础设施的性能和成本取决于服务器的整体配置，例如处理器、内存、网络和存储配置。除了拥有或维护硬件的成本之外，服务器配置的异构性进一步扩大了选择空间，导致不收敛。应用程序的性能对底层硬件的依赖性进一步加剧了这一挑战。尽管人们对资源配置越来越感兴趣，但很少有人开发准确实用的模型来主动预测与服务器配置相对应的数据密集型应用程序的性能，并在线提供成本最优配置。在这项工作中，通过对性能进行全面的实际系统实证分析，我们通过引入ProMLB来解决这些挑战:ProMLB是一种基于机器学习的主动资源配置方法。我们首先描述跨不同类型的服务器架构的不同类型的数据密集型工作负载。这种特性有助于准确地捕捉应用程序的行为，并训练一个模型来预测它们的性能。然后，ProMLB为每个应用程序构建一组跨平台性能模型。基于开发的预测模型，ProMLB使用优化技术来区分接近最优的配置，以最小化执行时间和成本的乘积。与oracle调度器相比，ProMLB在应用程序资源匹配方面达到了91%的准确性。平均而言，与基准调度器相比，ProMLB分别提高了42.6%和41.1%的性能和资源利用率。此外，ProMLB的每成本性能平均提高了2.5倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Adaptive Performance Modeling of Data-intensive Workloads for Resource Provisioning in Virtualized Environment

The processing of data-intensive workloads is a challenging and time-consuming task that often requires massive infrastructure to ensure fast data analysis. The cloud platform is the most popular and powerful scale-out infrastructure to perform big data analytics and eliminate the need to maintain expensive and high-end computing resources at the user side. The performance and the cost of such infrastructure depend on the overall server configuration, such as processor, memory, network, and storage configurations. In addition to the cost of owning or maintaining the hardware, the heterogeneity in the server configuration further expands the selection space, leading to non-convergence. The challenge is further exacerbated by the dependency of the application’s performance on the underlying hardware. Despite an increasing interest in resource provisioning, few works have been done to develop accurate and practical models to proactively predict the performance of data-intensive applications corresponding to the server configuration and provision a cost-optimal configuration online. In this work, through a comprehensive real-system empirical analysis of performance, we address these challenges by introducing ProMLB: a proactive machine-learning-based methodology for resource provisioning. We first characterize diverse types of data-intensive workloads across different types of server architectures. The characterization aids in accurately capture applications’ behavior and train a model for prediction of their performance. Then, ProMLB builds a set of cross-platform performance models for each application. Based on the developed predictive model, ProMLB uses an optimization technique to distinguish close-to-optimal configuration to minimize the product of execution time and cost. Compared to the oracle scheduler, ProMLB achieves 91% accuracy in terms of application-resource matching. On average, ProMLB improves the performance and resource utilization by 42.6% and 41.1%, respectively, compared to baseline scheduler. Moreover, ProMLB improves the performance per cost by 2.5× on average.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS)

自引率

0.00%

发文量