BEIRUT: Repository Mining for Defect Prediction

2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE) Pub Date : 2021-10-01 DOI:10.1109/ISSRE52982.2021.00018

Amir Elmishali, Bruno Sotto-Mayor, Inbal Roshanski, Amit Sultan, Meir Kalech

{"title":"BEIRUT: Repository Mining for Defect Prediction","authors":"Amir Elmishali, Bruno Sotto-Mayor, Inbal Roshanski, Amit Sultan, Meir Kalech","doi":"10.1109/ISSRE52982.2021.00018","DOIUrl":null,"url":null,"abstract":"Software Defect Prediction is an important activity used in the Testing Phase of the software development life cycle. Within the research of new defect prediction approaches and the selection of training sets for the classification task, different benchmarks have been analyzed in the literature. They provide several features and defective information over specific software archives. Therefore, they are commonly used in research to evaluate new approaches. However, the current benchmarks contain several limitations, such as lack of project variability, outdated benchmarks, single-version projects, a small number of projects and metrics, unavailable resources, poor usability, and non-extensible tools. Therefore, we introduce a novel tool Bgu rEpository mlning foR bUg predicIion (BEIRUT) for benchmark generation for defect prediction, composed of three main features: Given an open-source repository from GitHub, BEIRUT mines the software repository by (1) selecting the best $k$ versions, based on the defective rate of each version, (2) generating training sets and a testing set for defect prediction, composed of a large number of metrics and defective information extracted from each of the selected versions and (3) creating defect prediction models from those extracted metrics. In the end, BEIRUT extracts a diversified catalog of 644 metrics and the defective information from each component of $k$ versions, automatically selected based on the rate of defects in each version. They were collected from 512 different projects, starting from 2009. The tool is also supplemented with an easy-to-use web interface that provides a configurable selection of projects and metrics and an interface to manage the defect prediction tasks. Moreover, this tool is adapted to be extended with new projects and new extractors, introducing new metrics to the benchmark. The web service tool can be found at rps.ise.bgu.ac.il/beirut.","PeriodicalId":162410,"journal":{"name":"2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSRE52982.2021.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Software Defect Prediction is an important activity used in the Testing Phase of the software development life cycle. Within the research of new defect prediction approaches and the selection of training sets for the classification task, different benchmarks have been analyzed in the literature. They provide several features and defective information over specific software archives. Therefore, they are commonly used in research to evaluate new approaches. However, the current benchmarks contain several limitations, such as lack of project variability, outdated benchmarks, single-version projects, a small number of projects and metrics, unavailable resources, poor usability, and non-extensible tools. Therefore, we introduce a novel tool Bgu rEpository mlning foR bUg predicIion (BEIRUT) for benchmark generation for defect prediction, composed of three main features: Given an open-source repository from GitHub, BEIRUT mines the software repository by (1) selecting the best $k$ versions, based on the defective rate of each version, (2) generating training sets and a testing set for defect prediction, composed of a large number of metrics and defective information extracted from each of the selected versions and (3) creating defect prediction models from those extracted metrics. In the end, BEIRUT extracts a diversified catalog of 644 metrics and the defective information from each component of $k$ versions, automatically selected based on the rate of defects in each version. They were collected from 512 different projects, starting from 2009. The tool is also supplemented with an easy-to-use web interface that provides a configurable selection of projects and metrics and an interface to manage the defect prediction tasks. Moreover, this tool is adapted to be extended with new projects and new extractors, introducing new metrics to the benchmark. The web service tool can be found at rps.ise.bgu.ac.il/beirut.

查看原文本刊更多论文

用于缺陷预测的存储库挖掘

软件缺陷预测是软件开发生命周期的测试阶段中使用的一项重要活动。在研究新的缺陷预测方法和分类任务训练集的选择过程中，文献中分析了不同的基准。它们在特定的软件档案中提供了一些特性和有缺陷的信息。因此，它们通常用于研究评估新方法。然而，当前的基准包含一些限制，例如缺乏项目可变性、过时的基准、单一版本的项目、少量的项目和度量、不可用的资源、较差的可用性和不可扩展的工具。因此，我们引入了一种新的工具Bgu rEpository mlining foR bUg prediction (BEIRUT)，用于缺陷预测的基准生成，它由三个主要特征组成:给定一个来自GitHub的开源存储库，BEIRUT通过以下方式挖掘软件存储库:(1)根据每个版本的缺陷率选择最佳的$k$版本，(2)生成用于缺陷预测的训练集和测试集，该训练集和测试集由从每个选定版本中提取的大量度量和缺陷信息组成，(3)从这些提取的度量创建缺陷预测模型。最后，BEIRUT从$k$版本的每个组件中提取644个度量标准和缺陷信息的多样化目录，根据每个版本的缺陷率自动选择。这些照片是从2009年开始从512个不同的项目中收集的。该工具还补充了一个易于使用的网络界面，该界面提供了项目和度量的可配置选择，以及管理缺陷预测任务的界面。此外，该工具还可以通过新项目和新提取器进行扩展，从而为基准引入新的度量标准。web服务工具可以在rps.ise.bgu.ac.il/beirut找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)

自引率

0.00%

发文量