B. Luaphol, Boonchoo Srikudkao, J. Polpinij, M. Kaenampornpan
{"title":"Assembling Relevant Bug Report using the Constraint-based k-means Clustering","authors":"B. Luaphol, Boonchoo Srikudkao, J. Polpinij, M. Kaenampornpan","doi":"10.23919/INCIT.2018.8584866","DOIUrl":null,"url":null,"abstract":"Bug reports provide an important information for improving software quality. Today, many bug tracking systems (BTS) such as Bugzilla, Jira, Mantis, and Trac are developed for collecting bug reports from users around the world. Unfortunately, many tasks on the BTS are still performed manually by bug triagers. The process is time consuming and errors prone. Although many studies on bug reports have been proposed, the problems may have never been truly investigated. It is the problem of bug dependency which is when an unfixed bug‘a’ affects bug ‘b’. As a result, bug‘b cannot be fixed if bug ‘a’ is not fixed. To address this problem, the relevant bug reports must be assigned to the same specific category in order to help the developers recognize all bugs that are indicating to the same problem domain. Bug dependency is a time-consuming and labor-intensive process. This is a challenge issue. Therefore, this work aims to present a method for assembling the relevant bug reports into specific clusters by the modifiedk-mean clustering algorithm, called the constraint-based k-means clustering. Furthermore, three weighting methods oftf, tf-idf, and BM25 are compared. After testing by recall, precision, andF-measure, the results reveal good score in precision but the recall score should be improved. The method withtf returns the better results than tf-idf and BM25 methods because tf method is based on the local weight that has paid towards a specific cluster-oriented.ster-oriented.","PeriodicalId":144271,"journal":{"name":"2018 International Conference on Information Technology (InCIT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Information Technology (InCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/INCIT.2018.8584866","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Bug reports provide an important information for improving software quality. Today, many bug tracking systems (BTS) such as Bugzilla, Jira, Mantis, and Trac are developed for collecting bug reports from users around the world. Unfortunately, many tasks on the BTS are still performed manually by bug triagers. The process is time consuming and errors prone. Although many studies on bug reports have been proposed, the problems may have never been truly investigated. It is the problem of bug dependency which is when an unfixed bug‘a’ affects bug ‘b’. As a result, bug‘b cannot be fixed if bug ‘a’ is not fixed. To address this problem, the relevant bug reports must be assigned to the same specific category in order to help the developers recognize all bugs that are indicating to the same problem domain. Bug dependency is a time-consuming and labor-intensive process. This is a challenge issue. Therefore, this work aims to present a method for assembling the relevant bug reports into specific clusters by the modifiedk-mean clustering algorithm, called the constraint-based k-means clustering. Furthermore, three weighting methods oftf, tf-idf, and BM25 are compared. After testing by recall, precision, andF-measure, the results reveal good score in precision but the recall score should be improved. The method withtf returns the better results than tf-idf and BM25 methods because tf method is based on the local weight that has paid towards a specific cluster-oriented.ster-oriented.
Bug报告为提高软件质量提供了重要的信息。今天,许多bug跟踪系统(BTS),如Bugzilla、Jira、Mantis和Trac,都是为收集来自世界各地用户的bug报告而开发的。不幸的是,BTS上的许多任务仍然由bug触发器手动执行。这个过程很耗时,而且容易出错。尽管已经提出了许多关于bug报告的研究,但这些问题可能从未被真正调查过。这是bug依赖的问题,当一个未修复的bug ' a '影响bug ' b '时。因此,如果错误' a '没有修复,错误' b就无法修复。为了解决这个问题,相关的错误报告必须分配到相同的特定类别,以帮助开发人员识别指向相同问题域的所有错误。Bug依赖是一个耗时且费力的过程。这是一个具有挑战性的问题。因此,本工作旨在提出一种方法,通过改进的k-mean聚类算法,即基于约束的k-means聚类,将相关的bug报告聚集到特定的聚类中。并对off、tf-idf和BM25三种加权方法进行了比较。经查全率、查准率和f量测试,查准率得分较高,但查全率得分有待提高。带有tf的方法返回的结果比tf-idf和BM25方法更好,因为tf方法是基于面向特定集群的局部权重。