基于自然语言处理技术和机器学习算法的软件安全漏洞自动检测

IF 0.5 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS
Donghwang Cho, Vu Ngoc Son, D. Duc
{"title":"基于自然语言处理技术和机器学习算法的软件安全漏洞自动检测","authors":"Donghwang Cho, Vu Ngoc Son, D. Duc","doi":"10.5614/itbj.ict.res.appl.2022.16.1.5","DOIUrl":null,"url":null,"abstract":"Nowadays, software vulnerabilities pose a serious problem, because cyber-attackers often find ways to attack a system by exploiting software vulnerabilities. Detecting software vulnerabilities can be done using two main methods: i) signature-based detection, i.e. methods based on a list of known security vulnerabilities as a basis for contrasting and comparing; ii) behavior analysis-based detection using classification algorithms, i.e., methods based on analyzing the software code. In order to improve the ability to accurately detect software security vulnerabilities, this study proposes a new approach based on a technique of analyzing and standardizing software code and the random forest (RF) classification algorithm. The novelty and advantages of our proposed method are that to determine abnormal behavior of functions in the software, instead of trying to define behaviors of functions, this study uses the Word2vec natural language processing model to normalize and extract features of functions. Finally, to detect security vulnerabilities in the functions, this study proposes to use a popular and effective supervised machine learning algorithm.","PeriodicalId":42785,"journal":{"name":"Journal of ICT Research and Applications","volume":" ","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2022-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Automatically Detect Software Security Vulnerabilities Based on Natural Language Processing Techniques and Machine Learning Algorithms\",\"authors\":\"Donghwang Cho, Vu Ngoc Son, D. Duc\",\"doi\":\"10.5614/itbj.ict.res.appl.2022.16.1.5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, software vulnerabilities pose a serious problem, because cyber-attackers often find ways to attack a system by exploiting software vulnerabilities. Detecting software vulnerabilities can be done using two main methods: i) signature-based detection, i.e. methods based on a list of known security vulnerabilities as a basis for contrasting and comparing; ii) behavior analysis-based detection using classification algorithms, i.e., methods based on analyzing the software code. In order to improve the ability to accurately detect software security vulnerabilities, this study proposes a new approach based on a technique of analyzing and standardizing software code and the random forest (RF) classification algorithm. The novelty and advantages of our proposed method are that to determine abnormal behavior of functions in the software, instead of trying to define behaviors of functions, this study uses the Word2vec natural language processing model to normalize and extract features of functions. Finally, to detect security vulnerabilities in the functions, this study proposes to use a popular and effective supervised machine learning algorithm.\",\"PeriodicalId\":42785,\"journal\":{\"name\":\"Journal of ICT Research and Applications\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2022-05-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of ICT Research and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5614/itbj.ict.res.appl.2022.16.1.5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of ICT Research and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5614/itbj.ict.res.appl.2022.16.1.5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 4

摘要

如今,软件漏洞是一个严重的问题,因为网络攻击者经常通过利用软件漏洞来找到攻击系统的方法。检测软件漏洞可以使用两种主要方法:i)基于签名的检测,即基于已知安全漏洞列表的方法,作为对比和比较的基础;ii)使用分类算法的基于行为分析的检测,即基于分析软件代码的方法。为了提高准确检测软件安全漏洞的能力,本研究提出了一种基于软件代码分析和标准化技术以及随机森林(RF)分类算法的新方法。我们提出的方法的新颖性和优势在于,为了确定软件中函数的异常行为,本研究使用Word2vec自然语言处理模型来规范和提取函数的特征,而不是试图定义函数的行为。最后,为了检测函数中的安全漏洞,本研究提出使用一种流行且有效的监督机器学习算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Automatically Detect Software Security Vulnerabilities Based on Natural Language Processing Techniques and Machine Learning Algorithms
Nowadays, software vulnerabilities pose a serious problem, because cyber-attackers often find ways to attack a system by exploiting software vulnerabilities. Detecting software vulnerabilities can be done using two main methods: i) signature-based detection, i.e. methods based on a list of known security vulnerabilities as a basis for contrasting and comparing; ii) behavior analysis-based detection using classification algorithms, i.e., methods based on analyzing the software code. In order to improve the ability to accurately detect software security vulnerabilities, this study proposes a new approach based on a technique of analyzing and standardizing software code and the random forest (RF) classification algorithm. The novelty and advantages of our proposed method are that to determine abnormal behavior of functions in the software, instead of trying to define behaviors of functions, this study uses the Word2vec natural language processing model to normalize and extract features of functions. Finally, to detect security vulnerabilities in the functions, this study proposes to use a popular and effective supervised machine learning algorithm.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of ICT Research and Applications
Journal of ICT Research and Applications COMPUTER SCIENCE, INFORMATION SYSTEMS-
CiteScore
1.60
自引率
0.00%
发文量
13
审稿时长
24 weeks
期刊介绍: Journal of ICT Research and Applications welcomes full research articles in the area of Information and Communication Technology from the following subject areas: Information Theory, Signal Processing, Electronics, Computer Network, Telecommunication, Wireless & Mobile Computing, Internet Technology, Multimedia, Software Engineering, Computer Science, Information System and Knowledge Management. Authors are invited to submit articles that have not been published previously and are not under consideration elsewhere.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信