使用llm增强基于搜索的测试,以查找系统模拟器中的错误

IF 3.1 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Aidan Dakhama, Karine Even-Mendoza, W. B Langdon, Héctor D. Menéndez, Justyna Petke
{"title":"使用llm增强基于搜索的测试,以查找系统模拟器中的错误","authors":"Aidan Dakhama,&nbsp;Karine Even-Mendoza,&nbsp;W. B Langdon,&nbsp;Héctor D. Menéndez,&nbsp;Justyna Petke","doi":"10.1007/s10515-025-00531-7","DOIUrl":null,"url":null,"abstract":"<div><p>Despite the wide availability of automated testing techniques such as fuzzing, little attention has been devoted to testing computer architecture simulators. We propose a fully automated approach for this task. Our approach uses large language models (LLM) to generate input programs, including information about their parameters and types, as test cases for the simulators. The LLM’s output becomes the initial seed for an existing fuzzer, <span>AFL++</span>, which has been enhanced with three mutation operators, targeting both the input binary program and its parameters. We implement our approach in a tool called <span>SearchSYS</span> . We use it to test the <span>gem5</span> system simulator. <span>SearchSYS</span> discovered 21 new bugs in <span>gem5</span> , 14 where <span>gem5</span> ’s software prediction differs from the real behaviour on actual hardware, and 7 where it crashed. New defects were uncovered with each of the 6 LLMs used.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00531-7.pdf","citationCount":"0","resultStr":"{\"title\":\"Enhancing search-based testing with LLMs for finding bugs in system simulators\",\"authors\":\"Aidan Dakhama,&nbsp;Karine Even-Mendoza,&nbsp;W. B Langdon,&nbsp;Héctor D. Menéndez,&nbsp;Justyna Petke\",\"doi\":\"10.1007/s10515-025-00531-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Despite the wide availability of automated testing techniques such as fuzzing, little attention has been devoted to testing computer architecture simulators. We propose a fully automated approach for this task. Our approach uses large language models (LLM) to generate input programs, including information about their parameters and types, as test cases for the simulators. The LLM’s output becomes the initial seed for an existing fuzzer, <span>AFL++</span>, which has been enhanced with three mutation operators, targeting both the input binary program and its parameters. We implement our approach in a tool called <span>SearchSYS</span> . We use it to test the <span>gem5</span> system simulator. <span>SearchSYS</span> discovered 21 new bugs in <span>gem5</span> , 14 where <span>gem5</span> ’s software prediction differs from the real behaviour on actual hardware, and 7 where it crashed. New defects were uncovered with each of the 6 LLMs used.</p></div>\",\"PeriodicalId\":55414,\"journal\":{\"name\":\"Automated Software Engineering\",\"volume\":\"32 2\",\"pages\":\"\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s10515-025-00531-7.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Automated Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10515-025-00531-7\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-025-00531-7","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

摘要

尽管诸如模糊测试之类的自动化测试技术广泛可用,但很少有人关注对计算机体系结构模拟器的测试。我们提出了一种完全自动化的方法来完成这项任务。我们的方法使用大型语言模型(LLM)来生成输入程序,包括关于其参数和类型的信息,作为模拟器的测试用例。LLM的输出将成为现有模糊器afl++的初始种子,该模糊器已通过三个突变操作符进行增强,针对输入二进制程序及其参数。我们在一个叫做SearchSYS的工具中实现了我们的方法。我们用它来测试gem5系统模拟器。SearchSYS在gem5中发现了21个新漏洞,其中14个是gem5的软件预测与实际硬件上的实际行为不同,7个是它崩溃的地方。使用6个llm中的每一个都发现了新的缺陷。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Enhancing search-based testing with LLMs for finding bugs in system simulators

Despite the wide availability of automated testing techniques such as fuzzing, little attention has been devoted to testing computer architecture simulators. We propose a fully automated approach for this task. Our approach uses large language models (LLM) to generate input programs, including information about their parameters and types, as test cases for the simulators. The LLM’s output becomes the initial seed for an existing fuzzer, AFL++, which has been enhanced with three mutation operators, targeting both the input binary program and its parameters. We implement our approach in a tool called SearchSYS . We use it to test the gem5 system simulator. SearchSYS discovered 21 new bugs in gem5 , 14 where gem5 ’s software prediction differs from the real behaviour on actual hardware, and 7 where it crashed. New defects were uncovered with each of the 6 LLMs used.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Automated Software Engineering
Automated Software Engineering 工程技术-计算机:软件工程
CiteScore
4.80
自引率
11.80%
发文量
51
审稿时长
>12 weeks
期刊介绍: This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes. Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信