在面向对象软件中使用度量进行风险预测:跨版本验证

e Informatica Softw. Eng. J. Pub Date : 2022-01-01 DOI:10.17706/jsw.17.1.1-20

Salim Moudache, M. Badri

{"title":"在面向对象软件中使用度量进行风险预测:跨版本验证","authors":"Salim Moudache, M. Badri","doi":"10.17706/jsw.17.1.1-20","DOIUrl":null,"url":null,"abstract":"This work aims to investigate the potential, from different perspectives, of a risk model to support Cross-Version Fault and Severity Prediction (CVFSP) in object-oriented software. The risk of a class is addressed from the perspective of two particular factors: the number of faults it can contain and their severity. We used various object-oriented metrics to capture the two risk factors. The risk of a class is modeled using the concept of Euclidean distance. We used a dataset collected from five successive versions of an open-source Java software system (ANT). We investigated different variants of the considered risk model, based on various combinations of object-oriented metrics pairs. We used different machine learning algorithms for building the prediction models: Naive Bayes (NB), J48, Random Forest (RF), Support Vector Machines (SVM) and Multilayer Perceptron (ANN). We investigated the effectiveness of the prediction models for Cross-Version Fault and Severity Prediction (CVFSP), using data of prior versions of the considered system. We also investigated if the considered risk model can give as output the Empirical Risk (ER) of a class, a continuous value considering both the number of faults and their different levels of severity. We used different techniques for building the prediction models: Linear Regression (LR), Gaussian Process (GP), Random forest (RF) and M5P (two decision trees algorithms), SmoReg and Artificial Neural Network (ANN). The considered risk model achieves acceptable results for both cross-version binary fault prediction (a g-mean of 0.714, an AUC of 0.725) and cross-version multi-classification of levels of severity (a g-mean of 0.758, an AUC of 0.771). The model also achieves good results in the estimation of the empirical risk of a class by considering both the number of faults and their levels of severity (intra-version analysis with a correlation coefficient of 0.659, cross-version analysis with a correlation coefficient of 0.486).","PeriodicalId":11452,"journal":{"name":"e Informatica Softw. Eng. J.","volume":"1 1","pages":"1-20"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Using Metrics for Risk Prediction in Object-Oriented Software: A Cross-Version Validation\",\"authors\":\"Salim Moudache, M. Badri\",\"doi\":\"10.17706/jsw.17.1.1-20\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This work aims to investigate the potential, from different perspectives, of a risk model to support Cross-Version Fault and Severity Prediction (CVFSP) in object-oriented software. The risk of a class is addressed from the perspective of two particular factors: the number of faults it can contain and their severity. We used various object-oriented metrics to capture the two risk factors. The risk of a class is modeled using the concept of Euclidean distance. We used a dataset collected from five successive versions of an open-source Java software system (ANT). We investigated different variants of the considered risk model, based on various combinations of object-oriented metrics pairs. We used different machine learning algorithms for building the prediction models: Naive Bayes (NB), J48, Random Forest (RF), Support Vector Machines (SVM) and Multilayer Perceptron (ANN). We investigated the effectiveness of the prediction models for Cross-Version Fault and Severity Prediction (CVFSP), using data of prior versions of the considered system. We also investigated if the considered risk model can give as output the Empirical Risk (ER) of a class, a continuous value considering both the number of faults and their different levels of severity. We used different techniques for building the prediction models: Linear Regression (LR), Gaussian Process (GP), Random forest (RF) and M5P (two decision trees algorithms), SmoReg and Artificial Neural Network (ANN). The considered risk model achieves acceptable results for both cross-version binary fault prediction (a g-mean of 0.714, an AUC of 0.725) and cross-version multi-classification of levels of severity (a g-mean of 0.758, an AUC of 0.771). The model also achieves good results in the estimation of the empirical risk of a class by considering both the number of faults and their levels of severity (intra-version analysis with a correlation coefficient of 0.659, cross-version analysis with a correlation coefficient of 0.486).\",\"PeriodicalId\":11452,\"journal\":{\"name\":\"e Informatica Softw. Eng. J.\",\"volume\":\"1 1\",\"pages\":\"1-20\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"e Informatica Softw. Eng. J.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17706/jsw.17.1.1-20\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"e Informatica Softw. Eng. J.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17706/jsw.17.1.1-20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

这项工作旨在从不同的角度研究风险模型在面向对象软件中支持跨版本故障和严重性预测(CVFSP)的潜力。类的风险是从两个特定因素的角度来处理的:它可以包含的错误的数量和它们的严重性。我们使用各种面向对象的度量来捕获这两个风险因素。类的风险用欧几里得距离的概念建模。我们使用了从五个连续版本的开源Java软件系统(ANT)中收集的数据集。基于面向对象度量对的各种组合，我们研究了所考虑的风险模型的不同变体。我们使用不同的机器学习算法来构建预测模型:朴素贝叶斯(NB)、J48、随机森林(RF)、支持向量机(SVM)和多层感知器(ANN)。我们研究了跨版本故障和严重性预测(CVFSP)预测模型的有效性，使用了所考虑系统的先前版本的数据。我们还研究了所考虑的风险模型是否可以给出一个类的经验风险(ER)的输出，这是一个考虑故障数量及其不同严重程度的连续值。我们使用了不同的技术来构建预测模型:线性回归(LR)、高斯过程(GP)、随机森林(RF)和M5P(两种决策树算法)、SmoReg和人工神经网络(ANN)。所考虑的风险模型在跨版本二元故障预测(g-mean为0.714,AUC为0.725)和跨版本多重严重程度分类(g-mean为0.758,AUC为0.771)方面均取得了可接受的结果。该模型同时考虑故障数量和严重程度，对某一类的经验风险估计也取得了较好的结果(版本内分析相关系数为0.659，版本间分析相关系数为0.486)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Using Metrics for Risk Prediction in Object-Oriented Software: A Cross-Version Validation

This work aims to investigate the potential, from different perspectives, of a risk model to support Cross-Version Fault and Severity Prediction (CVFSP) in object-oriented software. The risk of a class is addressed from the perspective of two particular factors: the number of faults it can contain and their severity. We used various object-oriented metrics to capture the two risk factors. The risk of a class is modeled using the concept of Euclidean distance. We used a dataset collected from five successive versions of an open-source Java software system (ANT). We investigated different variants of the considered risk model, based on various combinations of object-oriented metrics pairs. We used different machine learning algorithms for building the prediction models: Naive Bayes (NB), J48, Random Forest (RF), Support Vector Machines (SVM) and Multilayer Perceptron (ANN). We investigated the effectiveness of the prediction models for Cross-Version Fault and Severity Prediction (CVFSP), using data of prior versions of the considered system. We also investigated if the considered risk model can give as output the Empirical Risk (ER) of a class, a continuous value considering both the number of faults and their different levels of severity. We used different techniques for building the prediction models: Linear Regression (LR), Gaussian Process (GP), Random forest (RF) and M5P (two decision trees algorithms), SmoReg and Artificial Neural Network (ANN). The considered risk model achieves acceptable results for both cross-version binary fault prediction (a g-mean of 0.714, an AUC of 0.725) and cross-version multi-classification of levels of severity (a g-mean of 0.758, an AUC of 0.771). The model also achieves good results in the estimation of the empirical risk of a class by considering both the number of faults and their levels of severity (intra-version analysis with a correlation coefficient of 0.659, cross-version analysis with a correlation coefficient of 0.486).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

e Informatica Softw. Eng. J.

自引率

0.00%

发文量