基于深度神经模型的覆盖引导灰盒模糊分析综述

IF 4.3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology Pub Date : 2025-06-02 DOI:10.1016/j.infsof.2025.107797

Junyang Qiu , Yupeng Jiang , Yuantian Miao , Wei Luo , Lei Pan , Xi Zheng

{"title":"基于深度神经模型的覆盖引导灰盒模糊分析综述","authors":"Junyang Qiu , Yupeng Jiang , Yuantian Miao , Wei Luo , Lei Pan , Xi Zheng","doi":"10.1016/j.infsof.2025.107797","DOIUrl":null,"url":null,"abstract":"<div><div>Coverage-guided greybox fuzzing (CGF) has emerged as a powerful technique for software vulnerability detection, yet traditional techniques often struggle with the increasing complexity of modern software systems and the vastness of input spaces. Deep neural networks (DNNs) have begun to fundamentally transform CGF by addressing these limitations through automated feature extraction, adaptive input generation, and intelligent path prioritization. However, despite these advancements, critical gaps persist in understanding the state-of-the-art landscape. Existing studies often lack rigorous benchmarks to evaluate scalability and generalizability, fail to address the interpretability of neural-guided decisions, and overlook the integration of emerging paradigms such as large language models (LLMs) and neurosymbolic reasoning. This survey systematically bridges these gaps by providing a comprehensive taxonomy of DNN-driven CGF techniques, analyzing their strengths and limitations across key fuzzing stages—seed generation, selection, and mutation. We find that although DNNs have significantly improved fuzzing efficiency, challenges such as semantically invalid seeds, high computational overhead, and limited cross-domain adaptability remain unresolved. Most importantly, we identify two transformative directions with the potential to redefine CGF: (1) <strong>LLM-powered fuzzing</strong>, which combines generative AI with domain-specific fine-tuning to produce context-aware inputs; and (2) <strong>neurosymbolic integration</strong>, which merges the precision of symbolic execution with the scalability of neural networks to tackle path explosion. By synthesizing these insights, this survey not only clarifies the state-of-the-art but also outlines a roadmap for developing robust, explainable, and widely applicable intelligent fuzzers. The future of CGF lies in hybrid models that integrate data-driven learning with formal methods, paving the way for autonomous vulnerability discovery in an era of increasingly complex software systems.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107797"},"PeriodicalIF":4.3000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A survey of coverage-guided greybox fuzzing with deep neural models\",\"authors\":\"Junyang Qiu , Yupeng Jiang , Yuantian Miao , Wei Luo , Lei Pan , Xi Zheng\",\"doi\":\"10.1016/j.infsof.2025.107797\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Coverage-guided greybox fuzzing (CGF) has emerged as a powerful technique for software vulnerability detection, yet traditional techniques often struggle with the increasing complexity of modern software systems and the vastness of input spaces. Deep neural networks (DNNs) have begun to fundamentally transform CGF by addressing these limitations through automated feature extraction, adaptive input generation, and intelligent path prioritization. However, despite these advancements, critical gaps persist in understanding the state-of-the-art landscape. Existing studies often lack rigorous benchmarks to evaluate scalability and generalizability, fail to address the interpretability of neural-guided decisions, and overlook the integration of emerging paradigms such as large language models (LLMs) and neurosymbolic reasoning. This survey systematically bridges these gaps by providing a comprehensive taxonomy of DNN-driven CGF techniques, analyzing their strengths and limitations across key fuzzing stages—seed generation, selection, and mutation. We find that although DNNs have significantly improved fuzzing efficiency, challenges such as semantically invalid seeds, high computational overhead, and limited cross-domain adaptability remain unresolved. Most importantly, we identify two transformative directions with the potential to redefine CGF: (1) <strong>LLM-powered fuzzing</strong>, which combines generative AI with domain-specific fine-tuning to produce context-aware inputs; and (2) <strong>neurosymbolic integration</strong>, which merges the precision of symbolic execution with the scalability of neural networks to tackle path explosion. By synthesizing these insights, this survey not only clarifies the state-of-the-art but also outlines a roadmap for developing robust, explainable, and widely applicable intelligent fuzzers. The future of CGF lies in hybrid models that integrate data-driven learning with formal methods, paving the way for autonomous vulnerability discovery in an era of increasingly complex software systems.</div></div>\",\"PeriodicalId\":54983,\"journal\":{\"name\":\"Information and Software Technology\",\"volume\":\"186 \",\"pages\":\"Article 107797\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-06-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information and Software Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950584925001363\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925001363","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

覆盖引导的灰盒模糊（CGF）已经成为一种强大的软件漏洞检测技术，然而传统技术经常与现代软件系统日益增加的复杂性和巨大的输入空间作斗争。深度神经网络（dnn）已经开始从根本上改变CGF，通过自动特征提取、自适应输入生成和智能路径优先级来解决这些限制。然而，尽管取得了这些进步，但在了解最新发展状况方面仍然存在重大差距。现有的研究往往缺乏严格的基准来评估可扩展性和泛化性，未能解决神经引导决策的可解释性，并且忽视了新兴范式的集成，如大型语言模型（llm）和神经符号推理。本调查通过提供dnn驱动的CGF技术的全面分类，分析其在关键模糊阶段（种子产生、选择和突变）的优势和局限性，系统地弥合了这些差距。我们发现，尽管dnn显著提高了模糊化效率，但语义无效种子、高计算开销和有限的跨域适应性等挑战仍未得到解决。最重要的是，我们确定了两个具有重新定义CGF潜力的变革方向：(1)llm驱动的模糊，它将生成人工智能与特定领域的微调相结合，以产生上下文感知输入；(2)神经符号集成，将符号执行的精度与神经网络的可扩展性相结合，以解决路径爆炸问题。通过综合这些见解，本调查不仅阐明了最先进的技术，还概述了开发强大的、可解释的、广泛适用的智能模糊器的路线图。CGF的未来在于将数据驱动学习与正式方法相结合的混合模型，为在软件系统日益复杂的时代自主发现漏洞铺平道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A survey of coverage-guided greybox fuzzing with deep neural models

Coverage-guided greybox fuzzing (CGF) has emerged as a powerful technique for software vulnerability detection, yet traditional techniques often struggle with the increasing complexity of modern software systems and the vastness of input spaces. Deep neural networks (DNNs) have begun to fundamentally transform CGF by addressing these limitations through automated feature extraction, adaptive input generation, and intelligent path prioritization. However, despite these advancements, critical gaps persist in understanding the state-of-the-art landscape. Existing studies often lack rigorous benchmarks to evaluate scalability and generalizability, fail to address the interpretability of neural-guided decisions, and overlook the integration of emerging paradigms such as large language models (LLMs) and neurosymbolic reasoning. This survey systematically bridges these gaps by providing a comprehensive taxonomy of DNN-driven CGF techniques, analyzing their strengths and limitations across key fuzzing stages—seed generation, selection, and mutation. We find that although DNNs have significantly improved fuzzing efficiency, challenges such as semantically invalid seeds, high computational overhead, and limited cross-domain adaptability remain unresolved. Most importantly, we identify two transformative directions with the potential to redefine CGF: (1) LLM-powered fuzzing, which combines generative AI with domain-specific fine-tuning to produce context-aware inputs; and (2) neurosymbolic integration, which merges the precision of symbolic execution with the scalability of neural networks to tackle path explosion. By synthesizing these insights, this survey not only clarifies the state-of-the-art but also outlines a roadmap for developing robust, explainable, and widely applicable intelligent fuzzers. The future of CGF lies in hybrid models that integrate data-driven learning with formal methods, paving the way for autonomous vulnerability discovery in an era of increasingly complex software systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information and Software Technology 工程技术-计算机：软件工程

CiteScore

9.10

自引率

7.70%

发文量

164

审稿时长

9.6 weeks

期刊介绍： Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include: • Software management, quality and metrics, • Software processes, • Software architecture, modelling, specification, design and programming • Functional and non-functional software requirements • Software testing and verification & validation • Empirical studies of all aspects of engineering and managing software development Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information. The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.