{"title":"MILE:情境学习系统的突变测试框架","authors":"Zeming Wei, Yihao Zhang, Meng Sun","doi":"arxiv-2409.04831","DOIUrl":null,"url":null,"abstract":"In-context Learning (ICL) has achieved notable success in the applications of\nlarge language models (LLMs). By adding only a few input-output pairs that\ndemonstrate a new task, the LLM can efficiently learn the task during inference\nwithout modifying the model parameters. Such mysterious ability of LLMs has\nattracted great research interests in understanding, formatting, and improving\nthe in-context demonstrations, while still suffering from drawbacks like\nblack-box mechanisms and sensitivity against the selection of examples. In this\nwork, inspired by the foundations of adopting testing techniques in machine\nlearning (ML) systems, we propose a mutation testing framework designed to\ncharacterize the quality and effectiveness of test data for ICL systems. First,\nwe propose several mutation operators specialized for ICL demonstrations, as\nwell as corresponding mutation scores for ICL test sets. With comprehensive\nexperiments, we showcase the effectiveness of our framework in evaluating the\nreliability and quality of ICL test suites. Our code is available at\nhttps://github.com/weizeming/MILE.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MILE: A Mutation Testing Framework of In-Context Learning Systems\",\"authors\":\"Zeming Wei, Yihao Zhang, Meng Sun\",\"doi\":\"arxiv-2409.04831\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In-context Learning (ICL) has achieved notable success in the applications of\\nlarge language models (LLMs). By adding only a few input-output pairs that\\ndemonstrate a new task, the LLM can efficiently learn the task during inference\\nwithout modifying the model parameters. Such mysterious ability of LLMs has\\nattracted great research interests in understanding, formatting, and improving\\nthe in-context demonstrations, while still suffering from drawbacks like\\nblack-box mechanisms and sensitivity against the selection of examples. In this\\nwork, inspired by the foundations of adopting testing techniques in machine\\nlearning (ML) systems, we propose a mutation testing framework designed to\\ncharacterize the quality and effectiveness of test data for ICL systems. First,\\nwe propose several mutation operators specialized for ICL demonstrations, as\\nwell as corresponding mutation scores for ICL test sets. With comprehensive\\nexperiments, we showcase the effectiveness of our framework in evaluating the\\nreliability and quality of ICL test suites. Our code is available at\\nhttps://github.com/weizeming/MILE.\",\"PeriodicalId\":501278,\"journal\":{\"name\":\"arXiv - CS - Software Engineering\",\"volume\":\"8 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.04831\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.04831","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
MILE: A Mutation Testing Framework of In-Context Learning Systems
In-context Learning (ICL) has achieved notable success in the applications of
large language models (LLMs). By adding only a few input-output pairs that
demonstrate a new task, the LLM can efficiently learn the task during inference
without modifying the model parameters. Such mysterious ability of LLMs has
attracted great research interests in understanding, formatting, and improving
the in-context demonstrations, while still suffering from drawbacks like
black-box mechanisms and sensitivity against the selection of examples. In this
work, inspired by the foundations of adopting testing techniques in machine
learning (ML) systems, we propose a mutation testing framework designed to
characterize the quality and effectiveness of test data for ICL systems. First,
we propose several mutation operators specialized for ICL demonstrations, as
well as corresponding mutation scores for ICL test sets. With comprehensive
experiments, we showcase the effectiveness of our framework in evaluating the
reliability and quality of ICL test suites. Our code is available at
https://github.com/weizeming/MILE.