Predicting Vulnerable Software Components through N-Gram Analysis and Statistical Feature Selection

Yulei Pang, Xiaozhen Xue, A. Namin
{"title":"Predicting Vulnerable Software Components through N-Gram Analysis and Statistical Feature Selection","authors":"Yulei Pang, Xiaozhen Xue, A. Namin","doi":"10.1109/ICMLA.2015.99","DOIUrl":null,"url":null,"abstract":"Vulnerabilities need to be detected and removed from software. Although previous studies demonstrated the usefulness of employing prediction techniques in deciding about vulnerabilities of software components, the accuracy and improvement of effectiveness of these prediction techniques is still a grand challenging research question. This paper proposes a hybrid technique based on combining N-gram analysis and feature selection algorithms for predicting vulnerable software components where features are defined as continuous sequences of token in source code files, i.e., Java class file. Machine learning-based feature selection algorithms are then employed to reduce the feature and search space. We evaluated the proposed technique based on some Java Android applications, and the results demonstrated that the proposed technique could predict vulnerable classes, i.e., software components, with high precision, accuracy and recall.","PeriodicalId":288427,"journal":{"name":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"53","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2015.99","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 53

Abstract

Vulnerabilities need to be detected and removed from software. Although previous studies demonstrated the usefulness of employing prediction techniques in deciding about vulnerabilities of software components, the accuracy and improvement of effectiveness of these prediction techniques is still a grand challenging research question. This paper proposes a hybrid technique based on combining N-gram analysis and feature selection algorithms for predicting vulnerable software components where features are defined as continuous sequences of token in source code files, i.e., Java class file. Machine learning-based feature selection algorithms are then employed to reduce the feature and search space. We evaluated the proposed technique based on some Java Android applications, and the results demonstrated that the proposed technique could predict vulnerable classes, i.e., software components, with high precision, accuracy and recall.
基于N-Gram分析和统计特征选择的易攻击软件组件预测
需要从软件中检测和删除漏洞。虽然以往的研究已经证明了利用预测技术来判断软件组件的漏洞是有用的,但这些预测技术的准确性和有效性的提高仍然是一个具有挑战性的研究问题。本文提出了一种基于N-gram分析和特征选择算法相结合的混合技术,用于预测易受攻击的软件组件,其中特征定义为源代码文件(即Java类文件)中的连续token序列。然后使用基于机器学习的特征选择算法来减少特征和搜索空间。基于Java Android应用程序对该技术进行了评估,结果表明,该技术能够预测软件组件的漏洞类别,具有较高的精密度、准确度和召回率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信