VulMPFF: A Vulnerability Detection Method for Fusing Code Features in Multiple Perspectives

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

IET Information Security Pub Date : 2024-03-22 DOI:10.1049/2024/4313185

Xiansheng Cao, Junfeng Wang, Peng Wu, Zhiyang Fang

{"title":"VulMPFF: A Vulnerability Detection Method for Fusing Code Features in Multiple Perspectives","authors":"Xiansheng Cao, Junfeng Wang, Peng Wu, Zhiyang Fang","doi":"10.1049/2024/4313185","DOIUrl":null,"url":null,"abstract":"<p>Source code vulnerabilities are one of the significant threats to software security. Existing deep learning-based detection methods have proven their effectiveness. However, most of them extract code information on a single intermediate representation of code (IRC), which often fails to extract multiple information hidden in the code fully, significantly limiting their performance. To address this problem, we propose VulMPFF, a vulnerability detection method that fuses code features under multiple perspectives. It extracts IRC from three perspectives: code sequence, lexical and syntactic relations, and graph structure to capture the vulnerability information in the code, which effectively realizes the complementary information of multiple IRCs and improves vulnerability detection performance. Specifically, VulMPFF extracts serialized abstract syntax tree as IRC from code sequence, lexical and syntactic relation perspective, and code property graph as IRC from graph structure perspective, and uses Bi-LSTM model with attention mechanism and graph neural network with attention mechanism to learn the code features from multiple perspectives and fuse them to detect the vulnerabilities in the code, respectively. We design a dual-attention mechanism to highlight critical code information for vulnerability triggering and better accomplish the vulnerability detection task. We evaluate our approach on three datasets. Experiments show that VulMPFF outperforms existing state-of-the-art vulnerability detection methods (i.e., Rats, FlawFinder, VulDeePecker, SySeVR, Devign, and Reveal) in Acc and F1 score, with improvements ranging from 14.71% to 145.78% and 152.08% to 344.77%, respectively. Meanwhile, experiments in the open-source project demonstrate that VulMPFF has the potential to detect vulnerabilities in real-world environments.</p>","PeriodicalId":50380,"journal":{"name":"IET Information Security","volume":"2024 1","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/2024/4313185","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Information Security","FirstCategoryId":"94","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/2024/4313185","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Source code vulnerabilities are one of the significant threats to software security. Existing deep learning-based detection methods have proven their effectiveness. However, most of them extract code information on a single intermediate representation of code (IRC), which often fails to extract multiple information hidden in the code fully, significantly limiting their performance. To address this problem, we propose VulMPFF, a vulnerability detection method that fuses code features under multiple perspectives. It extracts IRC from three perspectives: code sequence, lexical and syntactic relations, and graph structure to capture the vulnerability information in the code, which effectively realizes the complementary information of multiple IRCs and improves vulnerability detection performance. Specifically, VulMPFF extracts serialized abstract syntax tree as IRC from code sequence, lexical and syntactic relation perspective, and code property graph as IRC from graph structure perspective, and uses Bi-LSTM model with attention mechanism and graph neural network with attention mechanism to learn the code features from multiple perspectives and fuse them to detect the vulnerabilities in the code, respectively. We design a dual-attention mechanism to highlight critical code information for vulnerability triggering and better accomplish the vulnerability detection task. We evaluate our approach on three datasets. Experiments show that VulMPFF outperforms existing state-of-the-art vulnerability detection methods (i.e., Rats, FlawFinder, VulDeePecker, SySeVR, Devign, and Reveal) in Acc and F1 score, with improvements ranging from 14.71% to 145.78% and 152.08% to 344.77%, respectively. Meanwhile, experiments in the open-source project demonstrate that VulMPFF has the potential to detect vulnerabilities in real-world environments.

Abstract Image

查看原文本刊更多论文

VulMPFF：多角度融合代码特征的漏洞检测方法

源代码漏洞是软件安全的重大威胁之一。现有的基于深度学习的检测方法已经证明了其有效性。然而，大多数方法都是在单一的代码中间表示（IRC）上提取代码信息，往往不能完全提取隐藏在代码中的多种信息，大大限制了其性能。为了解决这个问题，我们提出了 VulMPFF，一种在多个视角下融合代码特征的漏洞检测方法。它从代码序列、词法和句法关系、图结构三个角度提取 IRC，捕捉代码中的漏洞信息，有效实现了多个 IRC 信息的互补，提高了漏洞检测性能。具体来说，VulMPFF 从代码序列、词法和句法关系角度提取序列化抽象语法树作为 IRC，从图结构角度提取代码属性图作为 IRC，并分别使用具有注意机制的 Bi-LSTM 模型和具有注意机制的图神经网络来学习多个角度的代码特征，并将其融合在一起检测代码中的漏洞。我们设计了一种双重关注机制，以突出用于触发漏洞的关键代码信息，从而更好地完成漏洞检测任务。我们在三个数据集上评估了我们的方法。实验表明，VulMPFF 在 Acc 和 F1 分数上优于现有的一流漏洞检测方法（即 Rats、FlawFinder、VulDeePecker、SySeVR、Devign 和 Reveal），分别提高了 14.71% 到 145.78%，以及 152.08% 到 344.77%。同时，开源项目的实验证明，VulMPFF 具有在真实世界环境中检测漏洞的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IET Information Security 工程技术-计算机：理论方法

CiteScore

3.80

自引率

7.10%

发文量

审稿时长

8.6 months

期刊介绍： IET Information Security publishes original research papers in the following areas of information security and cryptography. Submitting authors should specify clearly in their covering statement the area into which their paper falls. Scope: Access Control and Database Security Ad-Hoc Network Aspects Anonymity and E-Voting Authentication Block Ciphers and Hash Functions Blockchain, Bitcoin (Technical aspects only) Broadcast Encryption and Traitor Tracing Combinatorial Aspects Covert Channels and Information Flow Critical Infrastructures Cryptanalysis Dependability Digital Rights Management Digital Signature Schemes Digital Steganography Economic Aspects of Information Security Elliptic Curve Cryptography and Number Theory Embedded Systems Aspects Embedded Systems Security and Forensics Financial Cryptography Firewall Security Formal Methods and Security Verification Human Aspects Information Warfare and Survivability Intrusion Detection Java and XML Security Key Distribution Key Management Malware Multi-Party Computation and Threshold Cryptography Peer-to-peer Security PKIs Public-Key and Hybrid Encryption Quantum Cryptography Risks of using Computers Robust Networks Secret Sharing Secure Electronic Commerce Software Obfuscation Stream Ciphers Trust Models Watermarking and Fingerprinting Special Issues. Current Call for Papers: Security on Mobile and IoT devices - https://digital-library.theiet.org/files/IET_IFS_SMID_CFP.pdf