{"title":"Assessing the impact of tuning parameter in instance selection based bug resolution classification","authors":"Chaymae Miloudi , Laila Cheikhi , Ali Idri , Alain Abran","doi":"10.1016/j.infsof.2025.107874","DOIUrl":null,"url":null,"abstract":"<div><h3>Context</h3><div>Software maintenance is time-consuming and requires significant effort for bug resolution and various types of software enhancement. Estimating software maintenance effort is challenging for open source software (OSS) without historical data about direct effort expressed in terms of man-days, compared to proprietary software for which this data about effort is available. Therefore, maintenance efforts in the OSS context can only be estimated indirectly through other features, such as OSS bug reports, and other approaches, such as bug resolution prediction models using a number of machine learning (ML) techniques. Although these bug reports are at times large in size, they need to be preprocessed before they can be used. In this context, instance selection (IS) has been presented in the literature as a way of reducing the size of datasets by selecting a subset of instances. Additionally, ML techniques often require fine-tuning of numerous parameters to achieve optimal predictions. This is typically done using tuning parameter (TP) methods.</div></div><div><h3>Objective</h3><div>The empirical study reported here investigated the impact of TP methods together with instance selection algorithms (ISAs) on the performance of bug resolution prediction ML classifiers on five datasets: Eclipse JDT, Eclipse Platform, KDE, LibreOffice, and Apache.</div></div><div><h3>Method</h3><div>To this end, a set of 480 ML classifiers are built using 60 datasets including the five original ones, 15 reduced datasets using Edited Nearest Neighbor (ENN), Repeated Edited Nearest Neighbor (RENN), and all-k Nearest Neighbor (AllkNN) single ISAs, and 40 reduced datasets using Bagging, Random Feature Subsets, and Voting ensemble ISAs, together with four ML techniques (k Nearest Neighbor (kNN), Support Vector Machine (SVM), Voted Perceptron (VP), and Random Tree (RT) using Grid Search (GS) and Default Parameter (DP) configurations. Furthermore, the classifiers were evaluated using Accuracy, Precision, and Recall performance criteria, in addition to the ten-fold cross-validation method. Next, these classifiers are compared to determine how parameter tuning and IS can enhance bug resolution prediction performance.</div></div><div><h3>Conclusion</h3><div>The findings revealed that (1) using GS with single ISAs enhanced the performance of the built ML classifiers, (2) using GS with homogeneous and heterogeneous ensemble ISAs enhanced the performance of the built ML classifiers, and (3) associating GS and SVM with RENN (either used as a single ISA or implemented as a base algorithm for ensemble ISAs) gave the best performance.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"188 ","pages":"Article 107874"},"PeriodicalIF":4.3000,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925002137","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Context
Software maintenance is time-consuming and requires significant effort for bug resolution and various types of software enhancement. Estimating software maintenance effort is challenging for open source software (OSS) without historical data about direct effort expressed in terms of man-days, compared to proprietary software for which this data about effort is available. Therefore, maintenance efforts in the OSS context can only be estimated indirectly through other features, such as OSS bug reports, and other approaches, such as bug resolution prediction models using a number of machine learning (ML) techniques. Although these bug reports are at times large in size, they need to be preprocessed before they can be used. In this context, instance selection (IS) has been presented in the literature as a way of reducing the size of datasets by selecting a subset of instances. Additionally, ML techniques often require fine-tuning of numerous parameters to achieve optimal predictions. This is typically done using tuning parameter (TP) methods.
Objective
The empirical study reported here investigated the impact of TP methods together with instance selection algorithms (ISAs) on the performance of bug resolution prediction ML classifiers on five datasets: Eclipse JDT, Eclipse Platform, KDE, LibreOffice, and Apache.
Method
To this end, a set of 480 ML classifiers are built using 60 datasets including the five original ones, 15 reduced datasets using Edited Nearest Neighbor (ENN), Repeated Edited Nearest Neighbor (RENN), and all-k Nearest Neighbor (AllkNN) single ISAs, and 40 reduced datasets using Bagging, Random Feature Subsets, and Voting ensemble ISAs, together with four ML techniques (k Nearest Neighbor (kNN), Support Vector Machine (SVM), Voted Perceptron (VP), and Random Tree (RT) using Grid Search (GS) and Default Parameter (DP) configurations. Furthermore, the classifiers were evaluated using Accuracy, Precision, and Recall performance criteria, in addition to the ten-fold cross-validation method. Next, these classifiers are compared to determine how parameter tuning and IS can enhance bug resolution prediction performance.
Conclusion
The findings revealed that (1) using GS with single ISAs enhanced the performance of the built ML classifiers, (2) using GS with homogeneous and heterogeneous ensemble ISAs enhanced the performance of the built ML classifiers, and (3) associating GS and SVM with RENN (either used as a single ISA or implemented as a base algorithm for ensemble ISAs) gave the best performance.
期刊介绍:
Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include:
• Software management, quality and metrics,
• Software processes,
• Software architecture, modelling, specification, design and programming
• Functional and non-functional software requirements
• Software testing and verification & validation
• Empirical studies of all aspects of engineering and managing software development
Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information.
The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.