结构语义增强：更好地集成漏洞检测的代码语义

IF 4.3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology Pub Date : 2025-07-23 DOI:10.1016/j.infsof.2025.107824

Shaohui Wang , Yan Wu , Zifeng Cui , Lin Chen

{"title":"结构语义增强：更好地集成漏洞检测的代码语义","authors":"Shaohui Wang , Yan Wu , Zifeng Cui , Lin Chen","doi":"10.1016/j.infsof.2025.107824","DOIUrl":null,"url":null,"abstract":"<div><div>Code vulnerability detection is particularly critical in software development and maintenance because it may prevent software instability, data leakage, or more serious security threats. Traditional code vulnerability detection methods usually rely on static analysis. While static analysis covers the entire code base and detects early errors, it may struggle with highly complex code structures, leading to potential false positives or false negatives. Deep learning has introduced new opportunities for detecting vulnerabilities but faces challenges with complex code structures and logical relationships. Efforts to integrate natural language processing embeddings into models like Graph Neural Networks aim to enhance semantic understanding but depend on the quality of the NLP model and embeddings.</div><div>To address these challenges, we propose a methodology centered around the Structural Semantic Enhancement Method (SSEM), which combines the semantic understanding of deep learning with structured code information provided by static analysis. Specifically, our method extracts the key information of control flow graphs and data dependency graphs and designs specialized SSEM with attention mechanisms. Based on two large-scale datasets, including more than 40,000 code snippets, we experimentally validated the effectiveness of the proposed method. Experimental results show that our method performs better in identifying potential vulnerabilities in code compared to traditional deep learning methods and advanced deep learning vulnerability detection models.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"187 ","pages":"Article 107824"},"PeriodicalIF":4.3000,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Structural Semantic Enhancement: Better integrating code semantics for vulnerability detection\",\"authors\":\"Shaohui Wang , Yan Wu , Zifeng Cui , Lin Chen\",\"doi\":\"10.1016/j.infsof.2025.107824\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Code vulnerability detection is particularly critical in software development and maintenance because it may prevent software instability, data leakage, or more serious security threats. Traditional code vulnerability detection methods usually rely on static analysis. While static analysis covers the entire code base and detects early errors, it may struggle with highly complex code structures, leading to potential false positives or false negatives. Deep learning has introduced new opportunities for detecting vulnerabilities but faces challenges with complex code structures and logical relationships. Efforts to integrate natural language processing embeddings into models like Graph Neural Networks aim to enhance semantic understanding but depend on the quality of the NLP model and embeddings.</div><div>To address these challenges, we propose a methodology centered around the Structural Semantic Enhancement Method (SSEM), which combines the semantic understanding of deep learning with structured code information provided by static analysis. Specifically, our method extracts the key information of control flow graphs and data dependency graphs and designs specialized SSEM with attention mechanisms. Based on two large-scale datasets, including more than 40,000 code snippets, we experimentally validated the effectiveness of the proposed method. Experimental results show that our method performs better in identifying potential vulnerabilities in code compared to traditional deep learning methods and advanced deep learning vulnerability detection models.</div></div>\",\"PeriodicalId\":54983,\"journal\":{\"name\":\"Information and Software Technology\",\"volume\":\"187 \",\"pages\":\"Article 107824\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information and Software Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950584925001636\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925001636","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

代码漏洞检测在软件开发和维护中尤为重要，因为它可以防止软件不稳定、数据泄露或更严重的安全威胁。传统的代码漏洞检测方法通常依赖于静态分析。虽然静态分析涵盖了整个代码库并检测早期错误，但它可能会与高度复杂的代码结构作斗争，从而导致潜在的误报或误报。深度学习为检测漏洞带来了新的机会，但也面临着复杂代码结构和逻辑关系的挑战。将自然语言处理嵌入集成到图神经网络等模型中的努力旨在增强语义理解，但这取决于NLP模型和嵌入的质量。为了应对这些挑战，我们提出了一种以结构语义增强方法（SSEM）为中心的方法，该方法将深度学习的语义理解与静态分析提供的结构化代码信息相结合。具体而言，我们的方法提取控制流图和数据依赖图的关键信息，并设计具有关注机制的专用SSEM。基于2个大规模的数据集，包括40000多个代码片段，我们通过实验验证了该方法的有效性。实验结果表明，与传统的深度学习方法和先进的深度学习漏洞检测模型相比，我们的方法在识别代码中的潜在漏洞方面表现更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Structural Semantic Enhancement: Better integrating code semantics for vulnerability detection

Code vulnerability detection is particularly critical in software development and maintenance because it may prevent software instability, data leakage, or more serious security threats. Traditional code vulnerability detection methods usually rely on static analysis. While static analysis covers the entire code base and detects early errors, it may struggle with highly complex code structures, leading to potential false positives or false negatives. Deep learning has introduced new opportunities for detecting vulnerabilities but faces challenges with complex code structures and logical relationships. Efforts to integrate natural language processing embeddings into models like Graph Neural Networks aim to enhance semantic understanding but depend on the quality of the NLP model and embeddings.

To address these challenges, we propose a methodology centered around the Structural Semantic Enhancement Method (SSEM), which combines the semantic understanding of deep learning with structured code information provided by static analysis. Specifically, our method extracts the key information of control flow graphs and data dependency graphs and designs specialized SSEM with attention mechanisms. Based on two large-scale datasets, including more than 40,000 code snippets, we experimentally validated the effectiveness of the proposed method. Experimental results show that our method performs better in identifying potential vulnerabilities in code compared to traditional deep learning methods and advanced deep learning vulnerability detection models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information and Software Technology 工程技术-计算机：软件工程

CiteScore

9.10

自引率

7.70%

发文量

164

审稿时长

9.6 weeks

期刊介绍： Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include: • Software management, quality and metrics, • Software processes, • Software architecture, modelling, specification, design and programming • Functional and non-functional software requirements • Software testing and verification & validation • Empirical studies of all aspects of engineering and managing software development Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information. The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.