A formal framework for LLM-assisted automated generation of Zeek signatures from binary artifacts

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-08-16 DOI:10.1016/j.future.2025.108086

Claudia Greco , Michele Ianni

{"title":"A formal framework for LLM-assisted automated generation of Zeek signatures from binary artifacts","authors":"Claudia Greco , Michele Ianni","doi":"10.1016/j.future.2025.108086","DOIUrl":null,"url":null,"abstract":"<div><div>Designing semantically meaningful and operationally effective intrusion detection signatures remains a labor-intensive and expertise-driven task, particularly within the Zeek network monitoring framework. In this paper, we introduce a formalized and modular system for automating Zeek signature generation using Large Language Models (LLMs). Our pipeline begins with static analysis of binary artifacts, extracts salient behavioral features, and transforms them into structured prompts for an LLM tasked with synthesizing Zeek scripts. We provide a rigorous formal framework that defines each stage of this transformation, along with theoretical models for prompt distortion, injection resilience, and sanitization. Furthermore, we explore the adversarial surface exposed by LLMs—introducing a taxonomy of injection attacks, prompt inversion risks, and behavioral feedback loops—and propose mitigations grounded in filtering and robust prompt engineering. Our approach not only accelerates signature creation but also enhances interpretability and adaptability in evolving threat environments. The framework lays the groundwork for future extensions involving dynamic analysis and automated post-validation of generated signatures.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"175 ","pages":"Article 108086"},"PeriodicalIF":6.2000,"publicationDate":"2025-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25003802","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Designing semantically meaningful and operationally effective intrusion detection signatures remains a labor-intensive and expertise-driven task, particularly within the Zeek network monitoring framework. In this paper, we introduce a formalized and modular system for automating Zeek signature generation using Large Language Models (LLMs). Our pipeline begins with static analysis of binary artifacts, extracts salient behavioral features, and transforms them into structured prompts for an LLM tasked with synthesizing Zeek scripts. We provide a rigorous formal framework that defines each stage of this transformation, along with theoretical models for prompt distortion, injection resilience, and sanitization. Furthermore, we explore the adversarial surface exposed by LLMs—introducing a taxonomy of injection attacks, prompt inversion risks, and behavioral feedback loops—and propose mitigations grounded in filtering and robust prompt engineering. Our approach not only accelerates signature creation but also enhances interpretability and adaptability in evolving threat environments. The framework lays the groundwork for future extensions involving dynamic analysis and automated post-validation of generated signatures.

查看原文本刊更多论文

一个正式的框架，llm辅助从二进制工件自动生成Zeek签名

设计语义上有意义和操作上有效的入侵检测签名仍然是一项劳动密集型和专业知识驱动的任务，特别是在Zeek网络监控框架中。在本文中，我们介绍了一个形式化和模块化的系统，用于使用大型语言模型（llm）自动生成Zeek签名。我们的管道从二进制工件的静态分析开始，提取显著的行为特征，并将它们转换为结构化的提示，供负责合成Zeek脚本的LLM使用。我们提供了一个严格的正式框架，定义了这种转变的每个阶段，以及提示变形、注射弹性和消毒的理论模型。此外，我们探索了llms暴露的对抗表面，介绍了注入攻击的分类、提示反转风险和行为反馈回路，并提出了基于过滤和鲁棒提示工程的缓解措施。我们的方法不仅加快了签名的生成速度，而且提高了签名在不断变化的威胁环境中的可解释性和适应性。该框架为涉及动态分析和生成签名的自动后验证的未来扩展奠定了基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Future Generation Computer Systems-The International Journal of Escience 工程技术-计算机：理论方法

CiteScore

19.90

自引率

2.70%

发文量

376

审稿时长

10.6 months

期刊介绍： Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.