Khadija Hanifi, R. Fouladi, Basak Gencer Unsalver, Goksu Karadag
{"title":"软件漏洞预测知识在编程语言间的传递","authors":"Khadija Hanifi, R. Fouladi, Basak Gencer Unsalver, Goksu Karadag","doi":"10.48550/arXiv.2303.06177","DOIUrl":null,"url":null,"abstract":"Developing automated and smart software vulnerability detection models has been receiving great attention from both research and development communities. One of the biggest challenges in this area is the lack of code samples for all different programming languages. In this study, we address this issue by proposing a transfer learning technique to leverage available datasets and generate a model to detect common vulnerabilities in different programming languages. We use C source code samples to train a Convolutional Neural Network (CNN) model, then, we use Java source code samples to adopt and evaluate the learned model. We use code samples from two benchmark datasets: NIST Software Assurance Reference Dataset (SARD) and Draper VDISC dataset. The results show that proposed model detects vulnerabilities in both C and Java codes with average recall of 72\\%. Additionally, we employ explainable AI to investigate how much each feature contributes to the knowledge transfer mechanisms between C and Java in the proposed model.","PeriodicalId":420861,"journal":{"name":"International Conference on Evaluation of Novel Approaches to Software Engineering","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Software Vulnerability Prediction Knowledge Transferring Between Programming Languages\",\"authors\":\"Khadija Hanifi, R. Fouladi, Basak Gencer Unsalver, Goksu Karadag\",\"doi\":\"10.48550/arXiv.2303.06177\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Developing automated and smart software vulnerability detection models has been receiving great attention from both research and development communities. One of the biggest challenges in this area is the lack of code samples for all different programming languages. In this study, we address this issue by proposing a transfer learning technique to leverage available datasets and generate a model to detect common vulnerabilities in different programming languages. We use C source code samples to train a Convolutional Neural Network (CNN) model, then, we use Java source code samples to adopt and evaluate the learned model. We use code samples from two benchmark datasets: NIST Software Assurance Reference Dataset (SARD) and Draper VDISC dataset. The results show that proposed model detects vulnerabilities in both C and Java codes with average recall of 72\\\\%. Additionally, we employ explainable AI to investigate how much each feature contributes to the knowledge transfer mechanisms between C and Java in the proposed model.\",\"PeriodicalId\":420861,\"journal\":{\"name\":\"International Conference on Evaluation of Novel Approaches to Software Engineering\",\"volume\":\"74 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Evaluation of Novel Approaches to Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2303.06177\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Evaluation of Novel Approaches to Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2303.06177","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Software Vulnerability Prediction Knowledge Transferring Between Programming Languages
Developing automated and smart software vulnerability detection models has been receiving great attention from both research and development communities. One of the biggest challenges in this area is the lack of code samples for all different programming languages. In this study, we address this issue by proposing a transfer learning technique to leverage available datasets and generate a model to detect common vulnerabilities in different programming languages. We use C source code samples to train a Convolutional Neural Network (CNN) model, then, we use Java source code samples to adopt and evaluate the learned model. We use code samples from two benchmark datasets: NIST Software Assurance Reference Dataset (SARD) and Draper VDISC dataset. The results show that proposed model detects vulnerabilities in both C and Java codes with average recall of 72\%. Additionally, we employ explainable AI to investigate how much each feature contributes to the knowledge transfer mechanisms between C and Java in the proposed model.