{"title":"Search-based Feature Selection for Cross-Project Fault Prediction","authors":"Yogita Khatri, S. Singh","doi":"10.1109/PuneCon55413.2022.10014936","DOIUrl":null,"url":null,"abstract":"Cross-project fault prediction (CPFP) is a current field of research in the realm of software engineering. CPFP comes into play when there is a scarcity of within-project training data. In particular, it involves constructing a fault prediction model for software project ‘X’ using the defect/fault data of software project ‘Y’. However, the distribution dissimilarity between the two project's data creates a bottleneck in its success. Many existing approaches addressed this issue by selecting relevant instances from the training data without giving any attention to feature selection (FS). Thus, to assess the power of FS for effective CPFP, we investigated two search-based FS algorithms namely Binary Genetic Algorithm (BGA) and Binary Particle Swarm Optimization (BPSO) algorithm. We performed 26 CPFP experiments based on 8 software projects and compared their performance with a CPFP model (ALL_CPFP), built with all features. Although both BPSO _CPFP and BGA _CPFP showed their potential over ALL_CPFP, BPSO_CPFP performed better than BGA_CPFP in capturing the important features for effective CPFP.","PeriodicalId":258640,"journal":{"name":"2022 IEEE Pune Section International Conference (PuneCon)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Pune Section International Conference (PuneCon)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PuneCon55413.2022.10014936","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Cross-project fault prediction (CPFP) is a current field of research in the realm of software engineering. CPFP comes into play when there is a scarcity of within-project training data. In particular, it involves constructing a fault prediction model for software project ‘X’ using the defect/fault data of software project ‘Y’. However, the distribution dissimilarity between the two project's data creates a bottleneck in its success. Many existing approaches addressed this issue by selecting relevant instances from the training data without giving any attention to feature selection (FS). Thus, to assess the power of FS for effective CPFP, we investigated two search-based FS algorithms namely Binary Genetic Algorithm (BGA) and Binary Particle Swarm Optimization (BPSO) algorithm. We performed 26 CPFP experiments based on 8 software projects and compared their performance with a CPFP model (ALL_CPFP), built with all features. Although both BPSO _CPFP and BGA _CPFP showed their potential over ALL_CPFP, BPSO_CPFP performed better than BGA_CPFP in capturing the important features for effective CPFP.