ML-FEED: Machine Learning Framework for Efficient Exploit Detection

Tanujay Saha, Tamjid Al-Rahat, N. Aaraj, Yuan Tian, N. Jha
{"title":"ML-FEED: Machine Learning Framework for Efficient Exploit Detection","authors":"Tanujay Saha, Tamjid Al-Rahat, N. Aaraj, Yuan Tian, N. Jha","doi":"10.1109/TPS-ISA56441.2022.00027","DOIUrl":null,"url":null,"abstract":"Machine learning (ML)-based methods have recently become attractive for detecting security vulnerability exploits. Unfortunately, state-of-the-art ML models like long short-term memories (LSTMs) and transformers incur significant computation overheads. This overhead makes it infeasible to deploy them in real-time environments. We propose a novel ML-based exploit detection model, ML-FEED, that enables highly efficient inference without sacrificing performance. We develop a novel automated technique to extract vulnerability patterns from the Common Weakness Enumeration (CWE) and Common Vulnerabilities and Exposures (CVE) databases. This feature enables ML-FEED to be aware of the latest cyber weaknesses. Second, it is not based on the traditional approach of classifying sequences of application programming interface (API) calls into exploit categories. Such traditional methods that process entire sequences incur huge computational overheads. Instead, ML-FEED operates at a finer granularity and predicts the exploits triggered by every API call of the program trace. Then, it uses a state table to update the states of these potential exploits and track the progress of potential exploit chains. ML-FEED also employs a feature engineering approach that uses natural language processing-based word embeddings, frequency vectors, and one-hot encoding to detect semantically-similar instruction calls. Then, it updates the states of the predicted exploit categories and triggers an alarm when a vulnerability fingerprint executes. Our experiments show that ML-FEED is 72.9× and 75, 828.9× faster than state-of-the-art lightweight LSTM and transformer models, respectively. We trained and tested ML-FEED on 79 real-world exploit categories. It predicts categories of exploit in real-time with 98.2% precision, 97.4% recall, and 97.8% F1 score. These results also outperform the LSTM and transformer baselines. In addition, we evaluated ML-FEED on the attack traces of CVE vulnerability exploits in three popular Java libraries and detected all three reported critical vulnerabilities in them.","PeriodicalId":427887,"journal":{"name":"2022 IEEE 4th International Conference on Trust, Privacy and Security in Intelligent Systems, and Applications (TPS-ISA)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 4th International Conference on Trust, Privacy and Security in Intelligent Systems, and Applications (TPS-ISA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TPS-ISA56441.2022.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Machine learning (ML)-based methods have recently become attractive for detecting security vulnerability exploits. Unfortunately, state-of-the-art ML models like long short-term memories (LSTMs) and transformers incur significant computation overheads. This overhead makes it infeasible to deploy them in real-time environments. We propose a novel ML-based exploit detection model, ML-FEED, that enables highly efficient inference without sacrificing performance. We develop a novel automated technique to extract vulnerability patterns from the Common Weakness Enumeration (CWE) and Common Vulnerabilities and Exposures (CVE) databases. This feature enables ML-FEED to be aware of the latest cyber weaknesses. Second, it is not based on the traditional approach of classifying sequences of application programming interface (API) calls into exploit categories. Such traditional methods that process entire sequences incur huge computational overheads. Instead, ML-FEED operates at a finer granularity and predicts the exploits triggered by every API call of the program trace. Then, it uses a state table to update the states of these potential exploits and track the progress of potential exploit chains. ML-FEED also employs a feature engineering approach that uses natural language processing-based word embeddings, frequency vectors, and one-hot encoding to detect semantically-similar instruction calls. Then, it updates the states of the predicted exploit categories and triggers an alarm when a vulnerability fingerprint executes. Our experiments show that ML-FEED is 72.9× and 75, 828.9× faster than state-of-the-art lightweight LSTM and transformer models, respectively. We trained and tested ML-FEED on 79 real-world exploit categories. It predicts categories of exploit in real-time with 98.2% precision, 97.4% recall, and 97.8% F1 score. These results also outperform the LSTM and transformer baselines. In addition, we evaluated ML-FEED on the attack traces of CVE vulnerability exploits in three popular Java libraries and detected all three reported critical vulnerabilities in them.
ML-FEED:用于有效漏洞检测的机器学习框架
基于机器学习(ML)的方法最近在检测安全漏洞方面变得很有吸引力。不幸的是,像长短期记忆(lstm)和变压器这样的最先进的ML模型会产生大量的计算开销。这种开销使得在实时环境中部署它们变得不可行。我们提出了一种新的基于ml的漏洞检测模型,ML-FEED,它可以在不牺牲性能的情况下实现高效的推理。我们开发了一种新的自动化技术,从公共弱点枚举(CWE)和公共漏洞和暴露(CVE)数据库中提取漏洞模式。此功能使ML-FEED能够了解最新的网络弱点。其次,它不是基于将应用程序编程接口(API)调用序列分类为漏洞类别的传统方法。这种处理整个序列的传统方法会产生巨大的计算开销。相反,ML-FEED以更细的粒度操作,并预测由程序跟踪的每个API调用触发的漏洞。然后,它使用状态表来更新这些潜在漏洞的状态,并跟踪潜在漏洞链的进展。ML-FEED还采用了一种特征工程方法,该方法使用基于自然语言处理的词嵌入、频率向量和单热编码来检测语义相似的指令调用。然后,它更新预测的利用类别的状态,并在漏洞指纹执行时触发警报。我们的实验表明,ML-FEED分别比最先进的轻量级LSTM和变压器模型快72.9倍和75.5倍,828.9倍。我们在79个真实世界的漏洞利用类别上训练和测试了ML-FEED。它实时预测攻击类别的准确率为98.2%,召回率为97.4%,F1得分为97.8%。这些结果也优于LSTM和变压器基线。此外,我们在三个流行的Java库中评估了ML-FEED对CVE漏洞利用的攻击痕迹,并检测了其中所有三个报告的严重漏洞。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信