使用llm增强基于搜索的测试，以查找系统模拟器中的错误

IF 3.1 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering Pub Date : 2025-07-10 DOI:10.1007/s10515-025-00531-7

Aidan Dakhama, Karine Even-Mendoza, W. B Langdon, Héctor D. Menéndez, Justyna Petke

{"title":"使用llm增强基于搜索的测试，以查找系统模拟器中的错误","authors":"Aidan Dakhama, Karine Even-Mendoza, W. B Langdon, Héctor D. Menéndez, Justyna Petke","doi":"10.1007/s10515-025-00531-7","DOIUrl":null,"url":null,"abstract":"<div>Despite the wide availability of automated testing techniques such as fuzzing, little attention has been devoted to testing computer architecture simulators. We propose a fully automated approach for this task. Our approach uses large language models (LLM) to generate input programs, including information about their parameters and types, as test cases for the simulators. The LLM’s output becomes the initial seed for an existing fuzzer, AFL++, which has been enhanced with three mutation operators, targeting both the input binary program and its parameters. We implement our approach in a tool called SearchSYS . We use it to test the gem5 system simulator. SearchSYS discovered 21 new bugs in gem5 , 14 where gem5 ’s software prediction differs from the real behaviour on actual hardware, and 7 where it crashed. New defects were uncovered with each of the 6 LLMs used.</div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00531-7.pdf","citationCount":"0","resultStr":"{\"title\":\"Enhancing search-based testing with LLMs for finding bugs in system simulators\",\"authors\":\"Aidan Dakhama, Karine Even-Mendoza, W. B Langdon, Héctor D. Menéndez, Justyna Petke\",\"doi\":\"10.1007/s10515-025-00531-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>Despite the wide availability of automated testing techniques such as fuzzing, little attention has been devoted to testing computer architecture simulators. We propose a fully automated approach for this task. Our approach uses large language models (LLM) to generate input programs, including information about their parameters and types, as test cases for the simulators. The LLM’s output becomes the initial seed for an existing fuzzer, AFL++, which has been enhanced with three mutation operators, targeting both the input binary program and its parameters. We implement our approach in a tool called SearchSYS . We use it to test the gem5 system simulator. SearchSYS discovered 21 new bugs in gem5 , 14 where gem5 ’s software prediction differs from the real behaviour on actual hardware, and 7 where it crashed. New defects were uncovered with each of the 6 LLMs used.</div>\",\"PeriodicalId\":55414,\"journal\":{\"name\":\"Automated Software Engineering\",\"volume\":\"32 2\",\"pages\":\"\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s10515-025-00531-7.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Automated Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10515-025-00531-7\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-025-00531-7","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

尽管诸如模糊测试之类的自动化测试技术广泛可用，但很少有人关注对计算机体系结构模拟器的测试。我们提出了一种完全自动化的方法来完成这项任务。我们的方法使用大型语言模型（LLM）来生成输入程序，包括关于其参数和类型的信息，作为模拟器的测试用例。LLM的输出将成为现有模糊器afl++的初始种子，该模糊器已通过三个突变操作符进行增强，针对输入二进制程序及其参数。我们在一个叫做SearchSYS的工具中实现了我们的方法。我们用它来测试gem5系统模拟器。SearchSYS在gem5中发现了21个新漏洞，其中14个是gem5的软件预测与实际硬件上的实际行为不同，7个是它崩溃的地方。使用6个llm中的每一个都发现了新的缺陷。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhancing search-based testing with LLMs for finding bugs in system simulators

Despite the wide availability of automated testing techniques such as fuzzing, little attention has been devoted to testing computer architecture simulators. We propose a fully automated approach for this task. Our approach uses large language models (LLM) to generate input programs, including information about their parameters and types, as test cases for the simulators. The LLM’s output becomes the initial seed for an existing fuzzer, AFL++, which has been enhanced with three mutation operators, targeting both the input binary program and its parameters. We implement our approach in a tool called SearchSYS . We use it to test the gem5 system simulator. SearchSYS discovered 21 new bugs in gem5 , 14 where gem5 ’s software prediction differs from the real behaviour on actual hardware, and 7 where it crashed. New defects were uncovered with each of the 6 LLMs used.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Automated Software Engineering 工程技术-计算机：软件工程

CiteScore

4.80

自引率

11.80%

发文量

审稿时长

>12 weeks

期刊介绍： This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes. Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.