{"title":"Automatically Detect Software Security Vulnerabilities Based on Natural Language Processing Techniques and Machine Learning Algorithms","authors":"Donghwang Cho, Vu Ngoc Son, D. Duc","doi":"10.5614/itbj.ict.res.appl.2022.16.1.5","DOIUrl":null,"url":null,"abstract":"Nowadays, software vulnerabilities pose a serious problem, because cyber-attackers often find ways to attack a system by exploiting software vulnerabilities. Detecting software vulnerabilities can be done using two main methods: i) signature-based detection, i.e. methods based on a list of known security vulnerabilities as a basis for contrasting and comparing; ii) behavior analysis-based detection using classification algorithms, i.e., methods based on analyzing the software code. In order to improve the ability to accurately detect software security vulnerabilities, this study proposes a new approach based on a technique of analyzing and standardizing software code and the random forest (RF) classification algorithm. The novelty and advantages of our proposed method are that to determine abnormal behavior of functions in the software, instead of trying to define behaviors of functions, this study uses the Word2vec natural language processing model to normalize and extract features of functions. Finally, to detect security vulnerabilities in the functions, this study proposes to use a popular and effective supervised machine learning algorithm.","PeriodicalId":42785,"journal":{"name":"Journal of ICT Research and Applications","volume":" ","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2022-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of ICT Research and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5614/itbj.ict.res.appl.2022.16.1.5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 4
Abstract
Nowadays, software vulnerabilities pose a serious problem, because cyber-attackers often find ways to attack a system by exploiting software vulnerabilities. Detecting software vulnerabilities can be done using two main methods: i) signature-based detection, i.e. methods based on a list of known security vulnerabilities as a basis for contrasting and comparing; ii) behavior analysis-based detection using classification algorithms, i.e., methods based on analyzing the software code. In order to improve the ability to accurately detect software security vulnerabilities, this study proposes a new approach based on a technique of analyzing and standardizing software code and the random forest (RF) classification algorithm. The novelty and advantages of our proposed method are that to determine abnormal behavior of functions in the software, instead of trying to define behaviors of functions, this study uses the Word2vec natural language processing model to normalize and extract features of functions. Finally, to detect security vulnerabilities in the functions, this study proposes to use a popular and effective supervised machine learning algorithm.
期刊介绍:
Journal of ICT Research and Applications welcomes full research articles in the area of Information and Communication Technology from the following subject areas: Information Theory, Signal Processing, Electronics, Computer Network, Telecommunication, Wireless & Mobile Computing, Internet Technology, Multimedia, Software Engineering, Computer Science, Information System and Knowledge Management. Authors are invited to submit articles that have not been published previously and are not under consideration elsewhere.