Retrieve, Annotate, Evaluate, Repeat: Leveraging Multimodal LLMs for Large-Scale Product Retrieval Evaluation

Kasra Hosseini, Thomas Kober, Josip Krapac, Roland Vollgraf, Weiwei Cheng, Ana Peleteiro Ramallo
{"title":"Retrieve, Annotate, Evaluate, Repeat: Leveraging Multimodal LLMs for Large-Scale Product Retrieval Evaluation","authors":"Kasra Hosseini, Thomas Kober, Josip Krapac, Roland Vollgraf, Weiwei Cheng, Ana Peleteiro Ramallo","doi":"arxiv-2409.11860","DOIUrl":null,"url":null,"abstract":"Evaluating production-level retrieval systems at scale is a crucial yet\nchallenging task due to the limited availability of a large pool of\nwell-trained human annotators. Large Language Models (LLMs) have the potential\nto address this scaling issue and offer a viable alternative to humans for the\nbulk of annotation tasks. In this paper, we propose a framework for assessing\nthe product search engines in a large-scale e-commerce setting, leveraging\nMultimodal LLMs for (i) generating tailored annotation guidelines for\nindividual queries, and (ii) conducting the subsequent annotation task. Our\nmethod, validated through deployment on a large e-commerce platform,\ndemonstrates comparable quality to human annotations, significantly reduces\ntime and cost, facilitates rapid problem discovery, and provides an effective\nsolution for production-level quality control at scale.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11860","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Evaluating production-level retrieval systems at scale is a crucial yet challenging task due to the limited availability of a large pool of well-trained human annotators. Large Language Models (LLMs) have the potential to address this scaling issue and offer a viable alternative to humans for the bulk of annotation tasks. In this paper, we propose a framework for assessing the product search engines in a large-scale e-commerce setting, leveraging Multimodal LLMs for (i) generating tailored annotation guidelines for individual queries, and (ii) conducting the subsequent annotation task. Our method, validated through deployment on a large e-commerce platform, demonstrates comparable quality to human annotations, significantly reduces time and cost, facilitates rapid problem discovery, and provides an effective solution for production-level quality control at scale.
检索、注释、评估、重复:利用多模态 LLM 进行大规模产品检索评估
由于训练有素的人类注释者数量有限,对生产级检索系统进行大规模评估是一项至关重要但又极具挑战性的任务。大型语言模型(LLM)有可能解决这一规模化问题,并为大量注释任务提供可行的人工替代方案。在本文中,我们提出了一个在大规模电子商务环境中评估产品搜索引擎的框架,利用多模态 LLM (i) 生成针对单个查询的定制注释指南,(ii) 执行后续注释任务。我们的方法通过在大型电子商务平台上的部署得到了验证,其质量可与人工标注相媲美,大大减少了时间和成本,有利于快速发现问题,并为大规模生产级质量控制提供了有效的解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信