A Framework to Evaluate the Effectiveness of Different Load Testing Analysis Techniques

2016 IEEE International Conference on Software Testing, Verification and Validation (ICST) Pub Date : 2016-04-11 DOI:10.1109/ICST.2016.9

Ruoyu Gao, Z. Jiang, C. Barna, Marin Litoiu

{"title":"A Framework to Evaluate the Effectiveness of Different Load Testing Analysis Techniques","authors":"Ruoyu Gao, Z. Jiang, C. Barna, Marin Litoiu","doi":"10.1109/ICST.2016.9","DOIUrl":null,"url":null,"abstract":"Large-scale software systems like Amazon and eBay must be load tested to ensure they can handle hundreds and millions of current requests in the field. Load testing usually lasts for a few hours or even days and generates large volumes of system behavior data (execution logs and counters). This data must be properly analyzed to check whether there are any performance problems in a load test. However, the sheer size of the data prevents effective manual analysis. In addition, unlike functional tests, there is usually no test oracle associated with a load test. To cope with these challenges, there have been many analysis techniques proposed to automatically detect problems in a load test by comparing the behavior of the current test against previous test(s). Unfortunately, none of these techniques compare their performance against each other. In this paper, we have proposed a framework, which evaluates and compares the effectiveness of different test analysis techniques. We have evaluated a total of 23 test analysis techniques using load testing data from three open source systems. Based on our experiments, we have found that all the test analysis techniques can effectively build performance models using data from both buggy or non-buggy tests and flag the performance deviations between them. It is more cost-effective to compare the current test against two recent previous test(s), while using testing data collected under longer sampling intervals (≥180 seconds). Among all the test analysis techniques, Control Chart, Descriptive Statistics and Regression Tree yield the best performance. Our evaluation framework and findings can be very useful for load testing practitioners and researchers. To encourage further research on this topic, we have made our testing data publicity available to download.","PeriodicalId":155554,"journal":{"name":"2016 IEEE International Conference on Software Testing, Verification and Validation (ICST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Software Testing, Verification and Validation (ICST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICST.2016.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

Large-scale software systems like Amazon and eBay must be load tested to ensure they can handle hundreds and millions of current requests in the field. Load testing usually lasts for a few hours or even days and generates large volumes of system behavior data (execution logs and counters). This data must be properly analyzed to check whether there are any performance problems in a load test. However, the sheer size of the data prevents effective manual analysis. In addition, unlike functional tests, there is usually no test oracle associated with a load test. To cope with these challenges, there have been many analysis techniques proposed to automatically detect problems in a load test by comparing the behavior of the current test against previous test(s). Unfortunately, none of these techniques compare their performance against each other. In this paper, we have proposed a framework, which evaluates and compares the effectiveness of different test analysis techniques. We have evaluated a total of 23 test analysis techniques using load testing data from three open source systems. Based on our experiments, we have found that all the test analysis techniques can effectively build performance models using data from both buggy or non-buggy tests and flag the performance deviations between them. It is more cost-effective to compare the current test against two recent previous test(s), while using testing data collected under longer sampling intervals (≥180 seconds). Among all the test analysis techniques, Control Chart, Descriptive Statistics and Regression Tree yield the best performance. Our evaluation framework and findings can be very useful for load testing practitioners and researchers. To encourage further research on this topic, we have made our testing data publicity available to download.

查看原文本刊更多论文

一个评估不同负载测试分析技术有效性的框架

像亚马逊和eBay这样的大型软件系统必须进行负载测试，以确保它们能够处理现场的数亿个当前请求。负载测试通常持续几个小时甚至几天，并生成大量的系统行为数据(执行日志和计数器)。必须对这些数据进行适当的分析，以检查负载测试中是否存在性能问题。然而，庞大的数据妨碍了有效的手工分析。此外，与功能测试不同的是，通常没有与负载测试相关的测试oracle。为了应对这些挑战，已经提出了许多分析技术，通过比较当前测试和以前测试的行为来自动检测负载测试中的问题。不幸的是，这些技术都没有将它们的性能相互比较。在本文中，我们提出了一个框架来评估和比较不同测试分析技术的有效性。我们已经使用来自三个开放源码系统的负载测试数据评估了总共23种测试分析技术。根据我们的实验，我们发现所有的测试分析技术都可以使用有bug或无bug测试的数据有效地构建性能模型，并标记它们之间的性能偏差。在使用较长采样间隔(≥180秒)下收集的测试数据时，将当前测试与最近的两次测试进行比较更具成本效益。在所有的测试分析技术中，控制图、描述性统计和回归树的性能最好。我们的评估框架和发现对负载测试从业者和研究人员非常有用。为了鼓励对这一主题的进一步研究，我们已将我们的测试数据公开供下载。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE International Conference on Software Testing, Verification and Validation (ICST)

自引率

0.00%

发文量