{"title":"通过Transformer进行软件缺陷预测","authors":"Qihang Zhang, Bin Wu","doi":"10.1109/ITNEC48623.2020.9084745","DOIUrl":null,"url":null,"abstract":"In order to enhance software reliability, software defect prediction is used to predict potential defects and to improve efficiency of software examination. Traditional defect prediction methods mainly focus on design static code metrics, and building machine learning classifiers to predict pieces of code that potentially defective. However, these manual extracted features do not contain syntactic and semantic information of programs. These information is much more important than those metrics and can improve the accuracy of defect prediction. In this paper, we propose a framework called software defect prediction via transformer (DP-Transformer) which capture syntactic and semantic features from programs and use them to improve defect prediction. Specifically, we first parse source code into ASTs and then select representative nodes from ASTs to form token vectors. Then we employ mapping and word embedding to convert token vectors into numerical vectors and send the numerical vectors to transformer. Transformer will automatically extract syntactic and semantic features and eventually feed these features into a Logistic Regression classifier. We evaluate our method on seven open-source Java projects with certain labels and take F-measure as evaluation criteria. The experimental results show that averagely, the proposed DP-Transformer improves the state-of-art method by 8%.","PeriodicalId":235524,"journal":{"name":"2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Software Defect Prediction via Transformer\",\"authors\":\"Qihang Zhang, Bin Wu\",\"doi\":\"10.1109/ITNEC48623.2020.9084745\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In order to enhance software reliability, software defect prediction is used to predict potential defects and to improve efficiency of software examination. Traditional defect prediction methods mainly focus on design static code metrics, and building machine learning classifiers to predict pieces of code that potentially defective. However, these manual extracted features do not contain syntactic and semantic information of programs. These information is much more important than those metrics and can improve the accuracy of defect prediction. In this paper, we propose a framework called software defect prediction via transformer (DP-Transformer) which capture syntactic and semantic features from programs and use them to improve defect prediction. Specifically, we first parse source code into ASTs and then select representative nodes from ASTs to form token vectors. Then we employ mapping and word embedding to convert token vectors into numerical vectors and send the numerical vectors to transformer. Transformer will automatically extract syntactic and semantic features and eventually feed these features into a Logistic Regression classifier. We evaluate our method on seven open-source Java projects with certain labels and take F-measure as evaluation criteria. The experimental results show that averagely, the proposed DP-Transformer improves the state-of-art method by 8%.\",\"PeriodicalId\":235524,\"journal\":{\"name\":\"2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC)\",\"volume\":\"101 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITNEC48623.2020.9084745\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITNEC48623.2020.9084745","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In order to enhance software reliability, software defect prediction is used to predict potential defects and to improve efficiency of software examination. Traditional defect prediction methods mainly focus on design static code metrics, and building machine learning classifiers to predict pieces of code that potentially defective. However, these manual extracted features do not contain syntactic and semantic information of programs. These information is much more important than those metrics and can improve the accuracy of defect prediction. In this paper, we propose a framework called software defect prediction via transformer (DP-Transformer) which capture syntactic and semantic features from programs and use them to improve defect prediction. Specifically, we first parse source code into ASTs and then select representative nodes from ASTs to form token vectors. Then we employ mapping and word embedding to convert token vectors into numerical vectors and send the numerical vectors to transformer. Transformer will automatically extract syntactic and semantic features and eventually feed these features into a Logistic Regression classifier. We evaluate our method on seven open-source Java projects with certain labels and take F-measure as evaluation criteria. The experimental results show that averagely, the proposed DP-Transformer improves the state-of-art method by 8%.