{"title":"Development of Efficient and Optimal Models for Software Maintainability Prediction using Feature Selection Techniques","authors":"Kirti Lakra, A. Chug","doi":"10.1109/INDIACom51348.2021.00143","DOIUrl":null,"url":null,"abstract":"Software Maintainability is an indispensable characteristic to determine software quality. It can be described as the ease with which necessary changes such as fault correction, performance improvement, addition, or deletion of one or more attributes, etc., can be incorporated. A major purpose of software maintainability is to enable the software to adapt to the changing environment. Machine Learning (ML) algorithms are widely used for Software Maintainability Prediction (SMP). Hence, in the current study, QUES and UIMS, i.e., the two object-oriented datasets are used for SMP. In this study, an attempt has been made to improve the prediction results of five (ML) algorithms, viz., General Regression Neural Network (GRNN), Regularized Greedy Forest (RGF), Gradient Boosting Algorithm (GBA), Multivariate Linear Regression (MLR), and K-Nearest Neighbor (k-NN) on using three different feature selection methods, including the Pearson's Correlation (Filter Method), Backward Elimination (Wrapper Method), and Lasso Regularization (Embedded Method). Feature selection is a procedure to select a set of independent variables that contribute most to the predicted output, hence eliminating the irrelevant features in the data that may reduce the accuracy of an algorithm. The performance of all the models is evaluated using three accuracy measures, i.e., R-Squared, Mean Absolute Error (MAE), and Root Mean Square Error (RMSE). The results portray an improvement in the prediction accuracies after employing feature selection techniques. It is observed that for the QUES dataset, R-Squared value on an average improves by 157.89%. Also, MAE and RMSE values enhance by 19.59% and 24.90%, respectively, depicting an overall decrease in the error. Similarly, for UIMS dataset, R-Squared value on an average increase by 126.08%, representing an improvement in the accuracy. Further, MAE and RMSE values also improve for the UIMS dataset, by 12.44% and 8.16%, respectively.","PeriodicalId":415594,"journal":{"name":"2021 8th International Conference on Computing for Sustainable Global Development (INDIACom)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 8th International Conference on Computing for Sustainable Global Development (INDIACom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDIACom51348.2021.00143","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Software Maintainability is an indispensable characteristic to determine software quality. It can be described as the ease with which necessary changes such as fault correction, performance improvement, addition, or deletion of one or more attributes, etc., can be incorporated. A major purpose of software maintainability is to enable the software to adapt to the changing environment. Machine Learning (ML) algorithms are widely used for Software Maintainability Prediction (SMP). Hence, in the current study, QUES and UIMS, i.e., the two object-oriented datasets are used for SMP. In this study, an attempt has been made to improve the prediction results of five (ML) algorithms, viz., General Regression Neural Network (GRNN), Regularized Greedy Forest (RGF), Gradient Boosting Algorithm (GBA), Multivariate Linear Regression (MLR), and K-Nearest Neighbor (k-NN) on using three different feature selection methods, including the Pearson's Correlation (Filter Method), Backward Elimination (Wrapper Method), and Lasso Regularization (Embedded Method). Feature selection is a procedure to select a set of independent variables that contribute most to the predicted output, hence eliminating the irrelevant features in the data that may reduce the accuracy of an algorithm. The performance of all the models is evaluated using three accuracy measures, i.e., R-Squared, Mean Absolute Error (MAE), and Root Mean Square Error (RMSE). The results portray an improvement in the prediction accuracies after employing feature selection techniques. It is observed that for the QUES dataset, R-Squared value on an average improves by 157.89%. Also, MAE and RMSE values enhance by 19.59% and 24.90%, respectively, depicting an overall decrease in the error. Similarly, for UIMS dataset, R-Squared value on an average increase by 126.08%, representing an improvement in the accuracy. Further, MAE and RMSE values also improve for the UIMS dataset, by 12.44% and 8.16%, respectively.