Predicting Vulnerable Software Components through N-Gram Analysis and Statistical Feature Selection

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2015-12-01 DOI:10.1109/ICMLA.2015.99

Yulei Pang, Xiaozhen Xue, A. Namin

引用次数: 53

Abstract

Vulnerabilities need to be detected and removed from software. Although previous studies demonstrated the usefulness of employing prediction techniques in deciding about vulnerabilities of software components, the accuracy and improvement of effectiveness of these prediction techniques is still a grand challenging research question. This paper proposes a hybrid technique based on combining N-gram analysis and feature selection algorithms for predicting vulnerable software components where features are defined as continuous sequences of token in source code files, i.e., Java class file. Machine learning-based feature selection algorithms are then employed to reduce the feature and search space. We evaluated the proposed technique based on some Java Android applications, and the results demonstrated that the proposed technique could predict vulnerable classes, i.e., software components, with high precision, accuracy and recall.

查看原文本刊更多论文

基于N-Gram分析和统计特征选择的易攻击软件组件预测

需要从软件中检测和删除漏洞。虽然以往的研究已经证明了利用预测技术来判断软件组件的漏洞是有用的，但这些预测技术的准确性和有效性的提高仍然是一个具有挑战性的研究问题。本文提出了一种基于N-gram分析和特征选择算法相结合的混合技术，用于预测易受攻击的软件组件，其中特征定义为源代码文件(即Java类文件)中的连续token序列。然后使用基于机器学习的特征选择算法来减少特征和搜索空间。基于Java Android应用程序对该技术进行了评估，结果表明，该技术能够预测软件组件的漏洞类别，具有较高的精密度、准确度和召回率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)

自引率

0.00%

发文量