Automatically Identifying Shared Root Causes of Test Breakages in SAP HANA

2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP) Pub Date : 2022-05-01 DOI:10.1145/3510457.3513051

Gabin An, Juyeon Yoon, Jeongju Sohn, Jingun Hong, Dongwon Hwang, Shin Yoo

{"title":"Automatically Identifying Shared Root Causes of Test Breakages in SAP HANA","authors":"Gabin An, Juyeon Yoon, Jeongju Sohn, Jingun Hong, Dongwon Hwang, Shin Yoo","doi":"10.1145/3510457.3513051","DOIUrl":null,"url":null,"abstract":"Continuous Integration (CI) of a largescale software system such as SAP HANA can produce a non-trivial number of test breakages. Each breakage that newly occurs from daily runs needs to be manually inspected, triaged, and eventually assigned to developers for debugging. However, not all new breakages are unique, as some test breakages would share the same root cause; in addition, human errors can produce duplicate bug tickets for the same root cause. An automated identification of breakages with shared root causes will be able to significantly reduce the cost of the (typically manual) post-breakage steps. This paper investigates multiple similarity functions between test breakages to assist and automate the identification of test breakages that are caused by the same root cause. We consider multiple information sources, such as static (i.e., the code itself), historical (i.e., whether the test results have changed in a similar way in the past), as well as dynamic (i.e., whether the coverage of test cases are similar to each other), for the purpose of such automation. We evaluate a total of 27 individual similarity functions, using realworld CI data of SAP HANA from a six-month period. Further, using these individual similarity functions as in-put features, we construct a classification model that can predict whether two test breakages share the same root cause or not. When trained using ground truth labels extracted from the issue tracker of SAP HANA, our model achieves an F1 score of 0.743 when evaluated using a set of unseen test breakages collected over three months. Our results show that a classification model based on test similarity functions can successfully support the bug triage stage of a CI pipeline.","PeriodicalId":119790,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3510457.3513051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Continuous Integration (CI) of a largescale software system such as SAP HANA can produce a non-trivial number of test breakages. Each breakage that newly occurs from daily runs needs to be manually inspected, triaged, and eventually assigned to developers for debugging. However, not all new breakages are unique, as some test breakages would share the same root cause; in addition, human errors can produce duplicate bug tickets for the same root cause. An automated identification of breakages with shared root causes will be able to significantly reduce the cost of the (typically manual) post-breakage steps. This paper investigates multiple similarity functions between test breakages to assist and automate the identification of test breakages that are caused by the same root cause. We consider multiple information sources, such as static (i.e., the code itself), historical (i.e., whether the test results have changed in a similar way in the past), as well as dynamic (i.e., whether the coverage of test cases are similar to each other), for the purpose of such automation. We evaluate a total of 27 individual similarity functions, using realworld CI data of SAP HANA from a six-month period. Further, using these individual similarity functions as in-put features, we construct a classification model that can predict whether two test breakages share the same root cause or not. When trained using ground truth labels extracted from the issue tracker of SAP HANA, our model achieves an F1 score of 0.743 when evaluated using a set of unseen test breakages collected over three months. Our results show that a classification model based on test similarity functions can successfully support the bug triage stage of a CI pipeline.

查看原文本刊更多论文

自动识别SAP HANA测试中断的共享根本原因

大型软件系统(如SAP HANA)的持续集成(CI)可能会产生大量的测试中断。每天运行中新出现的每个损坏都需要手工检查、分类，并最终分配给开发人员进行调试。然而，并不是所有的新中断都是唯一的，因为一些测试中断会共享相同的根本原因;此外，人为错误可能会为相同的根本原因产生重复的错误票据。具有共享的根本原因的破损的自动识别将能够显著地减少破损后步骤的成本(通常是手动的)。本文研究了测试中断之间的多个相似函数，以帮助和自动识别由同一根本原因引起的测试中断。我们考虑多个信息源，例如静态的(例如，代码本身)，历史的(例如，测试结果是否在过去以类似的方式改变)，以及动态的(例如，测试用例的覆盖是否彼此相似)，为了实现这种自动化的目的。我们使用SAP HANA六个月的真实CI数据，总共评估了27个单独的相似性函数。此外，使用这些单个相似函数作为输入特征，我们构建了一个分类模型，该模型可以预测两个测试中断是否具有相同的根本原因。当使用从SAP HANA的问题跟踪器中提取的真实标签进行训练时，当使用三个月内收集的一组未见过的测试中断进行评估时，我们的模型获得了0.743的F1分数。我们的研究结果表明，基于测试相似度函数的分类模型可以成功地支持CI管道的错误分类阶段。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

自引率

0.00%

发文量