The machines are watching: Exploring the potential of Large Language Models for detecting Algorithmically Generated Domains

IF 3.7 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Information Security and Applications Pub Date : 2025-08-11 DOI:10.1016/j.jisa.2025.104176

Tomás Pelayo-Benedet , Ricardo J. Rodríguez , Carlos H. Gañán

{"title":"The machines are watching: Exploring the potential of Large Language Models for detecting Algorithmically Generated Domains","authors":"Tomás Pelayo-Benedet , Ricardo J. Rodríguez , Carlos H. Gañán","doi":"10.1016/j.jisa.2025.104176","DOIUrl":null,"url":null,"abstract":"<div><div>Algorithmically Generated Domains (AGDs) are integral to many modern malware campaigns, allowing adversaries to establish resilient command and control channels. While machine learning techniques are increasingly employed to detect AGDs, the potential of Large Language Models (LLMs) in this domain remains largely underexplored. In this paper, we examine the ability of nine commercial LLMs to identify malicious AGDs, without parameter tuning or domain-specific training. We evaluate zero-shot approaches and few-shot learning approaches, using minimal labeled examples and diverse datasets with multiple prompt strategies. Our results show that certain LLMs can achieve detection accuracy between 77.3% and 89.3%. In a 10-shot classification setting, the largest models excel at distinguishing between malware families, particularly those employing hash-based generation schemes, underscoring the promise of LLMs for advanced threat detection. However, significant limitations arise when these models encounter real-world DNS traffic. Performance degradation on benign but structurally suspect domains highlights the risk of false positives in operational environments. This shortcoming has real-world consequences for security practitioners, given the need to avoid erroneous domain blocking that disrupt legitimate services. Our findings underscore the practicality of LLM-driven AGD detection, while emphasizing key areas where future research is needed (such as more robust warning design and model refinement) to ensure reliability in production environments.</div></div>","PeriodicalId":48638,"journal":{"name":"Journal of Information Security and Applications","volume":"93 ","pages":"Article 104176"},"PeriodicalIF":3.7000,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Security and Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214212625002133","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Algorithmically Generated Domains (AGDs) are integral to many modern malware campaigns, allowing adversaries to establish resilient command and control channels. While machine learning techniques are increasingly employed to detect AGDs, the potential of Large Language Models (LLMs) in this domain remains largely underexplored. In this paper, we examine the ability of nine commercial LLMs to identify malicious AGDs, without parameter tuning or domain-specific training. We evaluate zero-shot approaches and few-shot learning approaches, using minimal labeled examples and diverse datasets with multiple prompt strategies. Our results show that certain LLMs can achieve detection accuracy between 77.3% and 89.3%. In a 10-shot classification setting, the largest models excel at distinguishing between malware families, particularly those employing hash-based generation schemes, underscoring the promise of LLMs for advanced threat detection. However, significant limitations arise when these models encounter real-world DNS traffic. Performance degradation on benign but structurally suspect domains highlights the risk of false positives in operational environments. This shortcoming has real-world consequences for security practitioners, given the need to avoid erroneous domain blocking that disrupt legitimate services. Our findings underscore the practicality of LLM-driven AGD detection, while emphasizing key areas where future research is needed (such as more robust warning design and model refinement) to ensure reliability in production environments.

查看原文本刊更多论文

机器正在观察：探索大型语言模型检测算法生成域的潜力

算法生成域（agd）是许多现代恶意软件活动不可或缺的一部分，允许攻击者建立弹性命令和控制通道。虽然机器学习技术越来越多地用于检测agd，但大型语言模型（llm）在该领域的潜力仍未得到充分开发。在本文中，我们检查了九个商业llm识别恶意agd的能力，没有参数调优或特定领域的训练。我们使用最小标记示例和具有多种提示策略的不同数据集来评估零射击方法和少射击学习方法。我们的研究结果表明，某些llm的检测准确率在77.3%到89.3%之间。在10次分类设置中，最大的模型擅长区分恶意软件家族，特别是那些使用基于哈希的生成方案的恶意软件，这强调了llm对高级威胁检测的承诺。然而，当这些模型遇到真实的DNS流量时，就会出现明显的限制。良性但结构可疑域的性能下降突出了操作环境中误报的风险。考虑到需要避免破坏合法服务的错误域阻塞，这个缺点会给安全从业者带来实际后果。我们的研究结果强调了llm驱动AGD检测的实用性，同时强调了未来需要研究的关键领域（如更稳健的预警设计和模型改进），以确保生产环境中的可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Information Security and Applications Computer Science-Computer Networks and Communications

CiteScore

10.90

自引率

5.40%

发文量

206

审稿时长

56 days

期刊介绍： Journal of Information Security and Applications (JISA) focuses on the original research and practice-driven applications with relevance to information security and applications. JISA provides a common linkage between a vibrant scientific and research community and industry professionals by offering a clear view on modern problems and challenges in information security, as well as identifying promising scientific and "best-practice" solutions. JISA issues offer a balance between original research work and innovative industrial approaches by internationally renowned information security experts and researchers.