ML-FEED: Machine Learning Framework for Efficient Exploit Detection

2022 IEEE 4th International Conference on Trust, Privacy and Security in Intelligent Systems, and Applications (TPS-ISA) Pub Date : 2022-12-01 DOI:10.1109/TPS-ISA56441.2022.00027

Tanujay Saha, Tamjid Al-Rahat, N. Aaraj, Yuan Tian, N. Jha

{"title":"ML-FEED: Machine Learning Framework for Efficient Exploit Detection","authors":"Tanujay Saha, Tamjid Al-Rahat, N. Aaraj, Yuan Tian, N. Jha","doi":"10.1109/TPS-ISA56441.2022.00027","DOIUrl":null,"url":null,"abstract":"Machine learning (ML)-based methods have recently become attractive for detecting security vulnerability exploits. Unfortunately, state-of-the-art ML models like long short-term memories (LSTMs) and transformers incur significant computation overheads. This overhead makes it infeasible to deploy them in real-time environments. We propose a novel ML-based exploit detection model, ML-FEED, that enables highly efficient inference without sacrificing performance. We develop a novel automated technique to extract vulnerability patterns from the Common Weakness Enumeration (CWE) and Common Vulnerabilities and Exposures (CVE) databases. This feature enables ML-FEED to be aware of the latest cyber weaknesses. Second, it is not based on the traditional approach of classifying sequences of application programming interface (API) calls into exploit categories. Such traditional methods that process entire sequences incur huge computational overheads. Instead, ML-FEED operates at a finer granularity and predicts the exploits triggered by every API call of the program trace. Then, it uses a state table to update the states of these potential exploits and track the progress of potential exploit chains. ML-FEED also employs a feature engineering approach that uses natural language processing-based word embeddings, frequency vectors, and one-hot encoding to detect semantically-similar instruction calls. Then, it updates the states of the predicted exploit categories and triggers an alarm when a vulnerability fingerprint executes. Our experiments show that ML-FEED is 72.9× and 75, 828.9× faster than state-of-the-art lightweight LSTM and transformer models, respectively. We trained and tested ML-FEED on 79 real-world exploit categories. It predicts categories of exploit in real-time with 98.2% precision, 97.4% recall, and 97.8% F1 score. These results also outperform the LSTM and transformer baselines. In addition, we evaluated ML-FEED on the attack traces of CVE vulnerability exploits in three popular Java libraries and detected all three reported critical vulnerabilities in them.","PeriodicalId":427887,"journal":{"name":"2022 IEEE 4th International Conference on Trust, Privacy and Security in Intelligent Systems, and Applications (TPS-ISA)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 4th International Conference on Trust, Privacy and Security in Intelligent Systems, and Applications (TPS-ISA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TPS-ISA56441.2022.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Machine learning (ML)-based methods have recently become attractive for detecting security vulnerability exploits. Unfortunately, state-of-the-art ML models like long short-term memories (LSTMs) and transformers incur significant computation overheads. This overhead makes it infeasible to deploy them in real-time environments. We propose a novel ML-based exploit detection model, ML-FEED, that enables highly efficient inference without sacrificing performance. We develop a novel automated technique to extract vulnerability patterns from the Common Weakness Enumeration (CWE) and Common Vulnerabilities and Exposures (CVE) databases. This feature enables ML-FEED to be aware of the latest cyber weaknesses. Second, it is not based on the traditional approach of classifying sequences of application programming interface (API) calls into exploit categories. Such traditional methods that process entire sequences incur huge computational overheads. Instead, ML-FEED operates at a finer granularity and predicts the exploits triggered by every API call of the program trace. Then, it uses a state table to update the states of these potential exploits and track the progress of potential exploit chains. ML-FEED also employs a feature engineering approach that uses natural language processing-based word embeddings, frequency vectors, and one-hot encoding to detect semantically-similar instruction calls. Then, it updates the states of the predicted exploit categories and triggers an alarm when a vulnerability fingerprint executes. Our experiments show that ML-FEED is 72.9× and 75, 828.9× faster than state-of-the-art lightweight LSTM and transformer models, respectively. We trained and tested ML-FEED on 79 real-world exploit categories. It predicts categories of exploit in real-time with 98.2% precision, 97.4% recall, and 97.8% F1 score. These results also outperform the LSTM and transformer baselines. In addition, we evaluated ML-FEED on the attack traces of CVE vulnerability exploits in three popular Java libraries and detected all three reported critical vulnerabilities in them.

查看原文本刊更多论文

ML-FEED:用于有效漏洞检测的机器学习框架

基于机器学习(ML)的方法最近在检测安全漏洞方面变得很有吸引力。不幸的是，像长短期记忆(lstm)和变压器这样的最先进的ML模型会产生大量的计算开销。这种开销使得在实时环境中部署它们变得不可行。我们提出了一种新的基于ml的漏洞检测模型，ML-FEED，它可以在不牺牲性能的情况下实现高效的推理。我们开发了一种新的自动化技术，从公共弱点枚举(CWE)和公共漏洞和暴露(CVE)数据库中提取漏洞模式。此功能使ML-FEED能够了解最新的网络弱点。其次，它不是基于将应用程序编程接口(API)调用序列分类为漏洞类别的传统方法。这种处理整个序列的传统方法会产生巨大的计算开销。相反，ML-FEED以更细的粒度操作，并预测由程序跟踪的每个API调用触发的漏洞。然后，它使用状态表来更新这些潜在漏洞的状态，并跟踪潜在漏洞链的进展。ML-FEED还采用了一种特征工程方法，该方法使用基于自然语言处理的词嵌入、频率向量和单热编码来检测语义相似的指令调用。然后，它更新预测的利用类别的状态，并在漏洞指纹执行时触发警报。我们的实验表明，ML-FEED分别比最先进的轻量级LSTM和变压器模型快72.9倍和75.5倍，828.9倍。我们在79个真实世界的漏洞利用类别上训练和测试了ML-FEED。它实时预测攻击类别的准确率为98.2%，召回率为97.4%，F1得分为97.8%。这些结果也优于LSTM和变压器基线。此外，我们在三个流行的Java库中评估了ML-FEED对CVE漏洞利用的攻击痕迹，并检测了其中所有三个报告的严重漏洞。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 4th International Conference on Trust, Privacy and Security in Intelligent Systems, and Applications (TPS-ISA)

自引率

0.00%

发文量