{"title":"ClusterCommit: A Just-in-Time Defect Prediction Approach Using Clusters of Projects","authors":"M. Shehab, A. Hamou-Lhadj, L. Alawneh","doi":"10.1109/saner53432.2022.00049","DOIUrl":null,"url":null,"abstract":"Existing Just-in-Time (JIT) bug prediction techniques are designed to work on single projects. In this paper, we present ClusterCommit, a JIT bug prediction approach geared towards clusters of projects that share common libraries and functionalities. Unlike existing techniques, ClusterCommit trains a machine learning model by combining commits from a set of projects that are part of a larger cluster. Once this model is built, ClusterCommit can be used to detect buggy commits in each of these projects. When applying ClusterCommits to 16 projects that revolve around the Hadoop ecosystem and 10 projects of the Hive ecosystem, the results show that ClusterCommit achieves an F1-score of 73% and MCC of 0.44 for both clusters. These preliminary results are very promising and may lead to new JIT bug prediction techniques geared towards projects that are part of a large cluster.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/saner53432.2022.00049","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Existing Just-in-Time (JIT) bug prediction techniques are designed to work on single projects. In this paper, we present ClusterCommit, a JIT bug prediction approach geared towards clusters of projects that share common libraries and functionalities. Unlike existing techniques, ClusterCommit trains a machine learning model by combining commits from a set of projects that are part of a larger cluster. Once this model is built, ClusterCommit can be used to detect buggy commits in each of these projects. When applying ClusterCommits to 16 projects that revolve around the Hadoop ecosystem and 10 projects of the Hive ecosystem, the results show that ClusterCommit achieves an F1-score of 73% and MCC of 0.44 for both clusters. These preliminary results are very promising and may lead to new JIT bug prediction techniques geared towards projects that are part of a large cluster.