{"title":"VulDefend:一种基于模式挖掘训练的语言模型软件漏洞检测新技术","authors":"Marwan Omar","doi":"10.1109/JEEIT58638.2023.10185860","DOIUrl":null,"url":null,"abstract":"The detection of vulnerabilities in source code is a critical task in software assurance. In this work, we propose a semi-supervised learning approach that leverages pattern-exploiting training and cloze-style questions. Our approach involves training a language model on the SARD and Devign datasets of code snippets with vulnerabilities, where the input is generated by masking parts of the code and asking the model to predict the masked tokens. Experimental results demonstrate that our approach can effectively detect vulnerabilities in source code, while leveraging the pattern information learned from the code snippets. This work highlights the feasibility of using pattern-exploiting training and cloze-style questions for improved performance in the detection of vulnerabilities in source code.","PeriodicalId":177556,"journal":{"name":"2023 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VulDefend: A Novel Technique based on Pattern-exploiting Training for Detecting Software Vulnerabilities Using Language Models\",\"authors\":\"Marwan Omar\",\"doi\":\"10.1109/JEEIT58638.2023.10185860\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The detection of vulnerabilities in source code is a critical task in software assurance. In this work, we propose a semi-supervised learning approach that leverages pattern-exploiting training and cloze-style questions. Our approach involves training a language model on the SARD and Devign datasets of code snippets with vulnerabilities, where the input is generated by masking parts of the code and asking the model to predict the masked tokens. Experimental results demonstrate that our approach can effectively detect vulnerabilities in source code, while leveraging the pattern information learned from the code snippets. This work highlights the feasibility of using pattern-exploiting training and cloze-style questions for improved performance in the detection of vulnerabilities in source code.\",\"PeriodicalId\":177556,\"journal\":{\"name\":\"2023 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/JEEIT58638.2023.10185860\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JEEIT58638.2023.10185860","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
VulDefend: A Novel Technique based on Pattern-exploiting Training for Detecting Software Vulnerabilities Using Language Models
The detection of vulnerabilities in source code is a critical task in software assurance. In this work, we propose a semi-supervised learning approach that leverages pattern-exploiting training and cloze-style questions. Our approach involves training a language model on the SARD and Devign datasets of code snippets with vulnerabilities, where the input is generated by masking parts of the code and asking the model to predict the masked tokens. Experimental results demonstrate that our approach can effectively detect vulnerabilities in source code, while leveraging the pattern information learned from the code snippets. This work highlights the feasibility of using pattern-exploiting training and cloze-style questions for improved performance in the detection of vulnerabilities in source code.