使用源代码度量预测安全漏洞

Sundarakrishnan Ganesh, Tobias Ohlsson, Francis Palma
{"title":"使用源代码度量预测安全漏洞","authors":"Sundarakrishnan Ganesh, Tobias Ohlsson, Francis Palma","doi":"10.1109/SweDS53855.2021.9638301","DOIUrl":null,"url":null,"abstract":"Large open-source systems generate and operate on a plethora of sensitive enterprise data. Thus, security threats or vulnerabilities must not be present in open-source systems and must be resolved as early as possible in the development phases to avoid catastrophic consequences. One way to recognize security vulnerabilities is to predict them while developers write code to minimize costs and resources. This study examines the effectiveness of machine learning algorithms to predict potential security vulnerabilities by analyzing the source code of a system. We obtained the security vulnerabilities dataset from Apache Tomcat security reports for version 4.x to 10.x. We also collected the source code of Apache Tomcat 4.x to 10.x to compute 43 object-oriented metrics. We assessed four traditional supervised learning algorithms, i.e., Naive Bayes (NB), Decision Tree (DT), K-Nearest Neighbors (KNN), and Logistic Regression (LR), to understand their efficacy in predicting security vulnerabilities. We obtained the highest accuracy of 80.6% using the KNN. Thus, the KNN classifier was demonstrated to be the most effective of all the models we built. The DT classifier also performed well but under-performed when it came to multi-class classification.","PeriodicalId":194514,"journal":{"name":"2021 Swedish Workshop on Data Science (SweDS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predicting Security Vulnerabilities using Source Code Metrics\",\"authors\":\"Sundarakrishnan Ganesh, Tobias Ohlsson, Francis Palma\",\"doi\":\"10.1109/SweDS53855.2021.9638301\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large open-source systems generate and operate on a plethora of sensitive enterprise data. Thus, security threats or vulnerabilities must not be present in open-source systems and must be resolved as early as possible in the development phases to avoid catastrophic consequences. One way to recognize security vulnerabilities is to predict them while developers write code to minimize costs and resources. This study examines the effectiveness of machine learning algorithms to predict potential security vulnerabilities by analyzing the source code of a system. We obtained the security vulnerabilities dataset from Apache Tomcat security reports for version 4.x to 10.x. We also collected the source code of Apache Tomcat 4.x to 10.x to compute 43 object-oriented metrics. We assessed four traditional supervised learning algorithms, i.e., Naive Bayes (NB), Decision Tree (DT), K-Nearest Neighbors (KNN), and Logistic Regression (LR), to understand their efficacy in predicting security vulnerabilities. We obtained the highest accuracy of 80.6% using the KNN. Thus, the KNN classifier was demonstrated to be the most effective of all the models we built. The DT classifier also performed well but under-performed when it came to multi-class classification.\",\"PeriodicalId\":194514,\"journal\":{\"name\":\"2021 Swedish Workshop on Data Science (SweDS)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Swedish Workshop on Data Science (SweDS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SweDS53855.2021.9638301\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Swedish Workshop on Data Science (SweDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SweDS53855.2021.9638301","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

大型开源系统生成并操作大量敏感的企业数据。因此,安全威胁或漏洞一定不能出现在开源系统中,必须在开发阶段尽早解决,以避免灾难性的后果。识别安全漏洞的一种方法是在开发人员编写代码时进行预测,以最小化成本和资源。本研究通过分析系统的源代码来检验机器学习算法预测潜在安全漏洞的有效性。我们从Apache Tomcat版本4的安全报告中获得了安全漏洞数据集。X到10。我们还收集了Apache Tomcat 4的源代码。X到10。X来计算43个面向对象的度量。我们评估了四种传统的监督学习算法,即朴素贝叶斯(NB)、决策树(DT)、k近邻(KNN)和逻辑回归(LR),以了解它们在预测安全漏洞方面的有效性。我们使用KNN获得了80.6%的最高准确率。因此,KNN分类器被证明是我们建立的所有模型中最有效的。DT分类器也表现良好,但在多类分类时表现不佳。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Predicting Security Vulnerabilities using Source Code Metrics
Large open-source systems generate and operate on a plethora of sensitive enterprise data. Thus, security threats or vulnerabilities must not be present in open-source systems and must be resolved as early as possible in the development phases to avoid catastrophic consequences. One way to recognize security vulnerabilities is to predict them while developers write code to minimize costs and resources. This study examines the effectiveness of machine learning algorithms to predict potential security vulnerabilities by analyzing the source code of a system. We obtained the security vulnerabilities dataset from Apache Tomcat security reports for version 4.x to 10.x. We also collected the source code of Apache Tomcat 4.x to 10.x to compute 43 object-oriented metrics. We assessed four traditional supervised learning algorithms, i.e., Naive Bayes (NB), Decision Tree (DT), K-Nearest Neighbors (KNN), and Logistic Regression (LR), to understand their efficacy in predicting security vulnerabilities. We obtained the highest accuracy of 80.6% using the KNN. Thus, the KNN classifier was demonstrated to be the most effective of all the models we built. The DT classifier also performed well but under-performed when it came to multi-class classification.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信