Aidan Dakhama, Karine Even-Mendoza, W. B Langdon, Héctor D. Menéndez, Justyna Petke
{"title":"使用llm增强基于搜索的测试,以查找系统模拟器中的错误","authors":"Aidan Dakhama, Karine Even-Mendoza, W. B Langdon, Héctor D. Menéndez, Justyna Petke","doi":"10.1007/s10515-025-00531-7","DOIUrl":null,"url":null,"abstract":"<div><p>Despite the wide availability of automated testing techniques such as fuzzing, little attention has been devoted to testing computer architecture simulators. We propose a fully automated approach for this task. Our approach uses large language models (LLM) to generate input programs, including information about their parameters and types, as test cases for the simulators. The LLM’s output becomes the initial seed for an existing fuzzer, <span>AFL++</span>, which has been enhanced with three mutation operators, targeting both the input binary program and its parameters. We implement our approach in a tool called <span>SearchSYS</span> . We use it to test the <span>gem5</span> system simulator. <span>SearchSYS</span> discovered 21 new bugs in <span>gem5</span> , 14 where <span>gem5</span> ’s software prediction differs from the real behaviour on actual hardware, and 7 where it crashed. New defects were uncovered with each of the 6 LLMs used.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00531-7.pdf","citationCount":"0","resultStr":"{\"title\":\"Enhancing search-based testing with LLMs for finding bugs in system simulators\",\"authors\":\"Aidan Dakhama, Karine Even-Mendoza, W. B Langdon, Héctor D. Menéndez, Justyna Petke\",\"doi\":\"10.1007/s10515-025-00531-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Despite the wide availability of automated testing techniques such as fuzzing, little attention has been devoted to testing computer architecture simulators. We propose a fully automated approach for this task. Our approach uses large language models (LLM) to generate input programs, including information about their parameters and types, as test cases for the simulators. The LLM’s output becomes the initial seed for an existing fuzzer, <span>AFL++</span>, which has been enhanced with three mutation operators, targeting both the input binary program and its parameters. We implement our approach in a tool called <span>SearchSYS</span> . We use it to test the <span>gem5</span> system simulator. <span>SearchSYS</span> discovered 21 new bugs in <span>gem5</span> , 14 where <span>gem5</span> ’s software prediction differs from the real behaviour on actual hardware, and 7 where it crashed. New defects were uncovered with each of the 6 LLMs used.</p></div>\",\"PeriodicalId\":55414,\"journal\":{\"name\":\"Automated Software Engineering\",\"volume\":\"32 2\",\"pages\":\"\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s10515-025-00531-7.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Automated Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10515-025-00531-7\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-025-00531-7","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
Enhancing search-based testing with LLMs for finding bugs in system simulators
Despite the wide availability of automated testing techniques such as fuzzing, little attention has been devoted to testing computer architecture simulators. We propose a fully automated approach for this task. Our approach uses large language models (LLM) to generate input programs, including information about their parameters and types, as test cases for the simulators. The LLM’s output becomes the initial seed for an existing fuzzer, AFL++, which has been enhanced with three mutation operators, targeting both the input binary program and its parameters. We implement our approach in a tool called SearchSYS . We use it to test the gem5 system simulator. SearchSYS discovered 21 new bugs in gem5 , 14 where gem5 ’s software prediction differs from the real behaviour on actual hardware, and 7 where it crashed. New defects were uncovered with each of the 6 LLMs used.
期刊介绍:
This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes.
Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.