Kasra Hosseini, Thomas Kober, Josip Krapac, Roland Vollgraf, Weiwei Cheng, Ana Peleteiro Ramallo
{"title":"检索、注释、评估、重复:利用多模态 LLM 进行大规模产品检索评估","authors":"Kasra Hosseini, Thomas Kober, Josip Krapac, Roland Vollgraf, Weiwei Cheng, Ana Peleteiro Ramallo","doi":"arxiv-2409.11860","DOIUrl":null,"url":null,"abstract":"Evaluating production-level retrieval systems at scale is a crucial yet\nchallenging task due to the limited availability of a large pool of\nwell-trained human annotators. Large Language Models (LLMs) have the potential\nto address this scaling issue and offer a viable alternative to humans for the\nbulk of annotation tasks. In this paper, we propose a framework for assessing\nthe product search engines in a large-scale e-commerce setting, leveraging\nMultimodal LLMs for (i) generating tailored annotation guidelines for\nindividual queries, and (ii) conducting the subsequent annotation task. Our\nmethod, validated through deployment on a large e-commerce platform,\ndemonstrates comparable quality to human annotations, significantly reduces\ntime and cost, facilitates rapid problem discovery, and provides an effective\nsolution for production-level quality control at scale.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Retrieve, Annotate, Evaluate, Repeat: Leveraging Multimodal LLMs for Large-Scale Product Retrieval Evaluation\",\"authors\":\"Kasra Hosseini, Thomas Kober, Josip Krapac, Roland Vollgraf, Weiwei Cheng, Ana Peleteiro Ramallo\",\"doi\":\"arxiv-2409.11860\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Evaluating production-level retrieval systems at scale is a crucial yet\\nchallenging task due to the limited availability of a large pool of\\nwell-trained human annotators. Large Language Models (LLMs) have the potential\\nto address this scaling issue and offer a viable alternative to humans for the\\nbulk of annotation tasks. In this paper, we propose a framework for assessing\\nthe product search engines in a large-scale e-commerce setting, leveraging\\nMultimodal LLMs for (i) generating tailored annotation guidelines for\\nindividual queries, and (ii) conducting the subsequent annotation task. Our\\nmethod, validated through deployment on a large e-commerce platform,\\ndemonstrates comparable quality to human annotations, significantly reduces\\ntime and cost, facilitates rapid problem discovery, and provides an effective\\nsolution for production-level quality control at scale.\",\"PeriodicalId\":501281,\"journal\":{\"name\":\"arXiv - CS - Information Retrieval\",\"volume\":\"2 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11860\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11860","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Evaluating production-level retrieval systems at scale is a crucial yet
challenging task due to the limited availability of a large pool of
well-trained human annotators. Large Language Models (LLMs) have the potential
to address this scaling issue and offer a viable alternative to humans for the
bulk of annotation tasks. In this paper, we propose a framework for assessing
the product search engines in a large-scale e-commerce setting, leveraging
Multimodal LLMs for (i) generating tailored annotation guidelines for
individual queries, and (ii) conducting the subsequent annotation task. Our
method, validated through deployment on a large e-commerce platform,
demonstrates comparable quality to human annotations, significantly reduces
time and cost, facilitates rapid problem discovery, and provides an effective
solution for production-level quality control at scale.