Chi Zhang, Xiaoli Wang, Jinfu Chen, Saihua Cai, Rexford Nii Ayitey Sosu
{"title":"基于语义特征增强的新型缺陷预测方法","authors":"Chi Zhang, Xiaoli Wang, Jinfu Chen, Saihua Cai, Rexford Nii Ayitey Sosu","doi":"10.1002/smr.2674","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Although cross-project defect prediction (CPDP) techniques that use traditional manual features to build defect prediction model have been well-developed, they usually ignore the semantic and structural information inside the program and fail to capture the hidden features that are critical for program category prediction, resulting in poor defect prediction results. Researchers have proposed using deep learning to automatically extract the semantic features of programs and fuse them with traditional features as training data. However, in practice, it is important to explore the effective representation of the semantic features in the programs and how the fusion of a reasonable ratio between the two types of features can maximize the effectiveness of the model. In this paper, we propose a semantic feature enhancement-based defect prediction framework (SFE-DP), which augments the semantic feature set extracted from the program code with data. We also introduce a layer of self-attentive mechanism and a matching layer to filter low-efficiency and non-critical semantic features in the model structure. Finally, we combine the idea of hybrid loss function to iteratively optimize the model parameters. Extensive experiments validate that SFE-DP can outperform the baseline approaches on 90 pairs of CPDP tasks formed by 10 open-source projects.</p>\n </div>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"36 9","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2024-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel defect prediction method based on semantic feature enhancement\",\"authors\":\"Chi Zhang, Xiaoli Wang, Jinfu Chen, Saihua Cai, Rexford Nii Ayitey Sosu\",\"doi\":\"10.1002/smr.2674\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Although cross-project defect prediction (CPDP) techniques that use traditional manual features to build defect prediction model have been well-developed, they usually ignore the semantic and structural information inside the program and fail to capture the hidden features that are critical for program category prediction, resulting in poor defect prediction results. Researchers have proposed using deep learning to automatically extract the semantic features of programs and fuse them with traditional features as training data. However, in practice, it is important to explore the effective representation of the semantic features in the programs and how the fusion of a reasonable ratio between the two types of features can maximize the effectiveness of the model. In this paper, we propose a semantic feature enhancement-based defect prediction framework (SFE-DP), which augments the semantic feature set extracted from the program code with data. We also introduce a layer of self-attentive mechanism and a matching layer to filter low-efficiency and non-critical semantic features in the model structure. Finally, we combine the idea of hybrid loss function to iteratively optimize the model parameters. Extensive experiments validate that SFE-DP can outperform the baseline approaches on 90 pairs of CPDP tasks formed by 10 open-source projects.</p>\\n </div>\",\"PeriodicalId\":48898,\"journal\":{\"name\":\"Journal of Software-Evolution and Process\",\"volume\":\"36 9\",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2024-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Software-Evolution and Process\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/smr.2674\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Software-Evolution and Process","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/smr.2674","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
A novel defect prediction method based on semantic feature enhancement
Although cross-project defect prediction (CPDP) techniques that use traditional manual features to build defect prediction model have been well-developed, they usually ignore the semantic and structural information inside the program and fail to capture the hidden features that are critical for program category prediction, resulting in poor defect prediction results. Researchers have proposed using deep learning to automatically extract the semantic features of programs and fuse them with traditional features as training data. However, in practice, it is important to explore the effective representation of the semantic features in the programs and how the fusion of a reasonable ratio between the two types of features can maximize the effectiveness of the model. In this paper, we propose a semantic feature enhancement-based defect prediction framework (SFE-DP), which augments the semantic feature set extracted from the program code with data. We also introduce a layer of self-attentive mechanism and a matching layer to filter low-efficiency and non-critical semantic features in the model structure. Finally, we combine the idea of hybrid loss function to iteratively optimize the model parameters. Extensive experiments validate that SFE-DP can outperform the baseline approaches on 90 pairs of CPDP tasks formed by 10 open-source projects.