{"title":"Tree-based software quality estimation models for fault prediction","authors":"T. Khoshgoftaar, Naeem Seliya","doi":"10.1109/METRIC.2002.1011339","DOIUrl":null,"url":null,"abstract":"Complex high-assurance software systems depend highly on reliability of their underlying software applications. Early identification of high-risk modules can assist in directing quality enhancement efforts to modules that are likely to have a high number of faults. Regression tree models are simple and effective as software quality prediction models, and timely predictions from such models can be used to achieve high software reliability. This paper presents a case study from our comprehensive evaluation (with several large case studies) of currently available regression tree algorithms for software fault prediction. These are, CART-LS (least squares), S-PLUS, and CART-LAD (least absolute deviation). The case study presented comprises of software design metrics collected from a large network telecommunications system consisting of almost 13 million lines of code. Tree models using design metrics are built to predict the number of faults in modules. The algorithms are also compared based on the structure and complexity of their tree models. Performance metrics, average absolute and average relative errors are used to evaluate fault prediction accuracy.","PeriodicalId":165815,"journal":{"name":"Proceedings Eighth IEEE Symposium on Software Metrics","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"157","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Eighth IEEE Symposium on Software Metrics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/METRIC.2002.1011339","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 157
Abstract
Complex high-assurance software systems depend highly on reliability of their underlying software applications. Early identification of high-risk modules can assist in directing quality enhancement efforts to modules that are likely to have a high number of faults. Regression tree models are simple and effective as software quality prediction models, and timely predictions from such models can be used to achieve high software reliability. This paper presents a case study from our comprehensive evaluation (with several large case studies) of currently available regression tree algorithms for software fault prediction. These are, CART-LS (least squares), S-PLUS, and CART-LAD (least absolute deviation). The case study presented comprises of software design metrics collected from a large network telecommunications system consisting of almost 13 million lines of code. Tree models using design metrics are built to predict the number of faults in modules. The algorithms are also compared based on the structure and complexity of their tree models. Performance metrics, average absolute and average relative errors are used to evaluate fault prediction accuracy.