Automatic prediction of bug fixing effort measured by code churn size

Proceedings of the 5th International Workshop on Software Mining Pub Date : 2016-09-03 DOI:10.1145/2975961.2975964

Ferdian Thung

{"title":"Automatic prediction of bug fixing effort measured by code churn size","authors":"Ferdian Thung","doi":"10.1145/2975961.2975964","DOIUrl":null,"url":null,"abstract":"During software maintenance, developers often receive many bug reports. Project managers often need to manage limited resources to resolve the many bugs that a project receives. To help project managers perform their job, past studies have proposed techniques that predict the amount of time that passes between a bug report being submitted and it being resolved. However, this time period might not be representative of the actual development effort, as developers might not work on the bug right away or all the time. In the open source development setting, developers are only volunteers and might not devote their full working hours to fix a bug in a particular open source project. In the industrial setting, developers might be asked to perform various tasks aside from fixing a particular bug. In this work, we estimate bug fixing effort in terms of code churn size. Code churn size is the number of lines of code that is either added, deleted, or modified to fix the bug. Lines of code has traditionally been used to estimate effort. However, no past studies have proposed techniques to automatically predict code churn size. In this work, using code churn size as estimation for bug fixing effort, we propose a classification-based approach that predicts, given a bug report, whether the bug fixing effort would be high or low. We have evaluated our approach on 1,029 bug reports from hadoop-common and struts2. The result is promising; we can achieve an Area Under the Receiver Operating Curve (AUC) of 0.612 to predict bug fixing effort in terms of lines of code churned, which is a 22.4% improvement over a baseline.","PeriodicalId":106703,"journal":{"name":"Proceedings of the 5th International Workshop on Software Mining","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th International Workshop on Software Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2975961.2975964","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

During software maintenance, developers often receive many bug reports. Project managers often need to manage limited resources to resolve the many bugs that a project receives. To help project managers perform their job, past studies have proposed techniques that predict the amount of time that passes between a bug report being submitted and it being resolved. However, this time period might not be representative of the actual development effort, as developers might not work on the bug right away or all the time. In the open source development setting, developers are only volunteers and might not devote their full working hours to fix a bug in a particular open source project. In the industrial setting, developers might be asked to perform various tasks aside from fixing a particular bug. In this work, we estimate bug fixing effort in terms of code churn size. Code churn size is the number of lines of code that is either added, deleted, or modified to fix the bug. Lines of code has traditionally been used to estimate effort. However, no past studies have proposed techniques to automatically predict code churn size. In this work, using code churn size as estimation for bug fixing effort, we propose a classification-based approach that predicts, given a bug report, whether the bug fixing effort would be high or low. We have evaluated our approach on 1,029 bug reports from hadoop-common and struts2. The result is promising; we can achieve an Area Under the Receiver Operating Curve (AUC) of 0.612 to predict bug fixing effort in terms of lines of code churned, which is a 22.4% improvement over a baseline.

查看原文本刊更多论文

通过代码变动大小来自动预测bug修复工作

在软件维护期间，开发人员经常收到许多错误报告。项目经理经常需要管理有限的资源来解决项目收到的许多错误。为了帮助项目经理完成他们的工作，过去的研究已经提出了预测从提交错误报告到解决错误报告之间所经过的时间的技术。然而，这段时间可能并不代表实际的开发工作，因为开发人员可能不会立即或一直处理bug。在开放源码开发环境中，开发人员只是志愿者，可能不会投入全部工作时间来修复特定开放源码项目中的错误。在工业环境中，除了修复特定的错误之外，开发人员可能会被要求执行各种任务。在这项工作中，我们根据代码变动的大小来估计bug修复的工作量。代码变动大小是为了修复错误而添加、删除或修改的代码行数。传统上使用代码行数来评估工作量。然而，过去没有研究提出自动预测代码流失大小的技术。在这项工作中，使用代码变动大小作为错误修复工作的估计，我们提出了一种基于分类的方法，根据错误报告，预测错误修复工作是高还是低。我们已经在hadoop-common和struts2的1029个bug报告中评估了我们的方法。结果是有希望的;我们可以实现接收器操作曲线下面积(AUC)为0.612，以代码行数来预测bug修复工作，这比基线提高了22.4%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 5th International Workshop on Software Mining

自引率

0.00%

发文量