Explaining Wide Area Data Transfer Performance

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing Pub Date : 2017-06-26 DOI:10.1145/3078597.3078605

Zhengchun Liu, Prasanna Balaprakash, R. Kettimuthu, Ian T Foster

{"title":"Explaining Wide Area Data Transfer Performance","authors":"Zhengchun Liu, Prasanna Balaprakash, R. Kettimuthu, Ian T Foster","doi":"10.1145/3078597.3078605","DOIUrl":null,"url":null,"abstract":"Disk-to-disk wide-area file transfers involve many subsystems and tunable application parameters that pose significant challenges for bottleneck detection, system optimization, and performance prediction. Performance models can be used to address these challenges but have not proved generally usable because of a need for extensive online experiments to characterize subsystems. We show here how to overcome the need for such experiments by applying machine learning methods to historical data to estimate parameters for predictive models. Starting with log data for millions of Globus transfers involving billions of files and hundreds of petabytes, we engineer features for endpoint CPU load, network interface card load, and transfer characteristics; and we use these features in both linear and nonlinear models of transfer performance, We show that the resulting models have high explanatory power. For a representative set of 30,653 transfers over 30 heavily used source-destination pairs (\"edges''),totaling 2,053 TB in 46.6 million files, we obtain median absolute percentage prediction errors (MdAPE) of 7.0% and 4.6% when using distinct linear and nonlinear models per edge, respectively; when using a single nonlinear model for all edges, we obtain an MdAPE of 7.8%. Our work broadens understanding of factors that influence file transfer rate by clarifying relationships between achieved transfer rates, transfer characteristics, and competing load. Our predictions can be used for distributed workflow scheduling and optimization, and our features can also be used for optimization and explanation.","PeriodicalId":436194,"journal":{"name":"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"39","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3078597.3078605","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 39

Abstract

Disk-to-disk wide-area file transfers involve many subsystems and tunable application parameters that pose significant challenges for bottleneck detection, system optimization, and performance prediction. Performance models can be used to address these challenges but have not proved generally usable because of a need for extensive online experiments to characterize subsystems. We show here how to overcome the need for such experiments by applying machine learning methods to historical data to estimate parameters for predictive models. Starting with log data for millions of Globus transfers involving billions of files and hundreds of petabytes, we engineer features for endpoint CPU load, network interface card load, and transfer characteristics; and we use these features in both linear and nonlinear models of transfer performance, We show that the resulting models have high explanatory power. For a representative set of 30,653 transfers over 30 heavily used source-destination pairs ("edges''),totaling 2,053 TB in 46.6 million files, we obtain median absolute percentage prediction errors (MdAPE) of 7.0% and 4.6% when using distinct linear and nonlinear models per edge, respectively; when using a single nonlinear model for all edges, we obtain an MdAPE of 7.8%. Our work broadens understanding of factors that influence file transfer rate by clarifying relationships between achieved transfer rates, transfer characteristics, and competing load. Our predictions can be used for distributed workflow scheduling and optimization, and our features can also be used for optimization and explanation.

查看原文本刊更多论文

解释广域数据传输性能

磁盘到磁盘的广域文件传输涉及许多子系统和可调应用程序参数，这对瓶颈检测、系统优化和性能预测构成了重大挑战。性能模型可用于解决这些挑战，但由于需要大量的在线实验来描述子系统的特征，因此尚未证明通常可用。我们在这里展示了如何通过将机器学习方法应用于历史数据来估计预测模型的参数来克服对此类实验的需求。从涉及数十亿文件和数百pb的数百万Globus传输的日志数据开始，我们为端点CPU负载、网络接口卡负载和传输特性设计功能;并将这些特征应用于迁移绩效的线性和非线性模型中，结果表明所得模型具有较高的解释力。对于30,653个传输的代表性集合，超过30个频繁使用的源-目的地对(“边”)，在4660万个文件中总计2,053 TB，我们分别在每个边使用不同的线性和非线性模型时获得中位数绝对百分比预测误差(MdAPE)为7.0%和4.6%;当对所有边使用单一非线性模型时，我们获得了7.8%的MdAPE。我们的工作通过澄清已实现的传输速率、传输特性和竞争负载之间的关系，拓宽了对影响文件传输速率的因素的理解。我们的预测可用于分布式工作流调度和优化，我们的特性也可用于优化和解释。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing

自引率

0.00%

发文量