Tanay Dixit, Daniel Lee, Sally Fang, Sai Sree Harsha, Anirudh Sureshan, Akash Maharaj, Yunyao Li
{"title":"RETAIN:引导 LLM 迁移的回归测试互动工具","authors":"Tanay Dixit, Daniel Lee, Sally Fang, Sai Sree Harsha, Anirudh Sureshan, Akash Maharaj, Yunyao Li","doi":"arxiv-2409.03928","DOIUrl":null,"url":null,"abstract":"Large Language Models (LLMs) are increasingly integrated into diverse\napplications. The rapid evolution of LLMs presents opportunities for developers\nto enhance applications continuously. However, this constant adaptation can\nalso lead to performance regressions during model migrations. While several\ninteractive tools have been proposed to streamline the complexity of prompt\nengineering, few address the specific requirements of regression testing for\nLLM Migrations. To bridge this gap, we introduce RETAIN (REgression Testing\nguided LLM migrAtIoN), a tool designed explicitly for regression testing in LLM\nMigrations. RETAIN comprises two key components: an interactive interface\ntailored to regression testing needs during LLM migrations, and an error\ndiscovery module that facilitates understanding of differences in model\nbehaviors. The error discovery module generates textual descriptions of various\nerrors or differences between model outputs, providing actionable insights for\nprompt refinement. Our automatic evaluation and empirical user studies\ndemonstrate that RETAIN, when compared to manual evaluation, enabled\nparticipants to identify twice as many errors, facilitated experimentation with\n75% more prompts, and achieves 12% higher metric scores in a given time frame.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"55 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RETAIN: Interactive Tool for Regression Testing Guided LLM Migration\",\"authors\":\"Tanay Dixit, Daniel Lee, Sally Fang, Sai Sree Harsha, Anirudh Sureshan, Akash Maharaj, Yunyao Li\",\"doi\":\"arxiv-2409.03928\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large Language Models (LLMs) are increasingly integrated into diverse\\napplications. The rapid evolution of LLMs presents opportunities for developers\\nto enhance applications continuously. However, this constant adaptation can\\nalso lead to performance regressions during model migrations. While several\\ninteractive tools have been proposed to streamline the complexity of prompt\\nengineering, few address the specific requirements of regression testing for\\nLLM Migrations. To bridge this gap, we introduce RETAIN (REgression Testing\\nguided LLM migrAtIoN), a tool designed explicitly for regression testing in LLM\\nMigrations. RETAIN comprises two key components: an interactive interface\\ntailored to regression testing needs during LLM migrations, and an error\\ndiscovery module that facilitates understanding of differences in model\\nbehaviors. The error discovery module generates textual descriptions of various\\nerrors or differences between model outputs, providing actionable insights for\\nprompt refinement. Our automatic evaluation and empirical user studies\\ndemonstrate that RETAIN, when compared to manual evaluation, enabled\\nparticipants to identify twice as many errors, facilitated experimentation with\\n75% more prompts, and achieves 12% higher metric scores in a given time frame.\",\"PeriodicalId\":501281,\"journal\":{\"name\":\"arXiv - CS - Information Retrieval\",\"volume\":\"55 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.03928\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.03928","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
RETAIN: Interactive Tool for Regression Testing Guided LLM Migration
Large Language Models (LLMs) are increasingly integrated into diverse
applications. The rapid evolution of LLMs presents opportunities for developers
to enhance applications continuously. However, this constant adaptation can
also lead to performance regressions during model migrations. While several
interactive tools have been proposed to streamline the complexity of prompt
engineering, few address the specific requirements of regression testing for
LLM Migrations. To bridge this gap, we introduce RETAIN (REgression Testing
guided LLM migrAtIoN), a tool designed explicitly for regression testing in LLM
Migrations. RETAIN comprises two key components: an interactive interface
tailored to regression testing needs during LLM migrations, and an error
discovery module that facilitates understanding of differences in model
behaviors. The error discovery module generates textual descriptions of various
errors or differences between model outputs, providing actionable insights for
prompt refinement. Our automatic evaluation and empirical user studies
demonstrate that RETAIN, when compared to manual evaluation, enabled
participants to identify twice as many errors, facilitated experimentation with
75% more prompts, and achieves 12% higher metric scores in a given time frame.