Cheng Zeng, Chunpeng Zhou, Shengkai Lv, Peng He, Jie Huang
{"title":"GCN2defect : Graph Convolutional Networks for SMOTETomek-based Software Defect Prediction","authors":"Cheng Zeng, Chunpeng Zhou, Shengkai Lv, Peng He, Jie Huang","doi":"10.1109/ISSRE52982.2021.00020","DOIUrl":null,"url":null,"abstract":"With the introduction of network metrics into the field of software defect prediction, the dependency network of software modules is widely used. The network embedding models aim to represent nodes as low-dimensional vectors, thereby preserving the topological structure of the network. However, in software engineering, traditional network embedding models do not concern deep learning strategies, while recently, graph neural networks (GNNs) have been proved to be an effective deep learning framework for learning graph data. As a variant of GNN, graph convolution neural network (GCN) has achieved appealing results in node classification and link prediction. Inspired by the performance of GCN, we propose GCN2defect, which extends GCN to automatically learn to encode the software dependency network and ultimately improve software defect prediction. Specifically, we firstly construct a program's Class Dependency Network, and then use node2vec for embedded learning to obtain the structural features of the network automatically. After that, we combine the learned structural features with traditional software code features to initialize the attributes of nodes in the Class Dependency Network. Next, we feed the dependency network to GCN to get much deeper representation of the class. Meanwhile, to enhance the accuracy of prediction, we also employ the SMOTETomek sampling to solve the problem of data imbalance. Finally, we evaluate the proposed method on eight open-source programs and demonstrate that, on average, GCN2defect improves the state-of-the-art approach by 6.84% ~ 23.85% in terms of the F-measure.","PeriodicalId":162410,"journal":{"name":"2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSRE52982.2021.00020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
With the introduction of network metrics into the field of software defect prediction, the dependency network of software modules is widely used. The network embedding models aim to represent nodes as low-dimensional vectors, thereby preserving the topological structure of the network. However, in software engineering, traditional network embedding models do not concern deep learning strategies, while recently, graph neural networks (GNNs) have been proved to be an effective deep learning framework for learning graph data. As a variant of GNN, graph convolution neural network (GCN) has achieved appealing results in node classification and link prediction. Inspired by the performance of GCN, we propose GCN2defect, which extends GCN to automatically learn to encode the software dependency network and ultimately improve software defect prediction. Specifically, we firstly construct a program's Class Dependency Network, and then use node2vec for embedded learning to obtain the structural features of the network automatically. After that, we combine the learned structural features with traditional software code features to initialize the attributes of nodes in the Class Dependency Network. Next, we feed the dependency network to GCN to get much deeper representation of the class. Meanwhile, to enhance the accuracy of prediction, we also employ the SMOTETomek sampling to solve the problem of data imbalance. Finally, we evaluate the proposed method on eight open-source programs and demonstrate that, on average, GCN2defect improves the state-of-the-art approach by 6.84% ~ 23.85% in terms of the F-measure.