Retrieve, Annotate, Evaluate, Repeat: Leveraging Multimodal LLMs for Large-Scale Product Retrieval Evaluation

arXiv - CS - Information Retrieval Pub Date : 2024-09-18 DOI:arxiv-2409.11860

Kasra Hosseini, Thomas Kober, Josip Krapac, Roland Vollgraf, Weiwei Cheng, Ana Peleteiro Ramallo

引用次数: 0

Abstract

Evaluating production-level retrieval systems at scale is a crucial yet challenging task due to the limited availability of a large pool of well-trained human annotators. Large Language Models (LLMs) have the potential to address this scaling issue and offer a viable alternative to humans for the bulk of annotation tasks. In this paper, we propose a framework for assessing the product search engines in a large-scale e-commerce setting, leveraging Multimodal LLMs for (i) generating tailored annotation guidelines for individual queries, and (ii) conducting the subsequent annotation task. Our method, validated through deployment on a large e-commerce platform, demonstrates comparable quality to human annotations, significantly reduces time and cost, facilitates rapid problem discovery, and provides an effective solution for production-level quality control at scale.

查看原文本刊更多论文

检索、注释、评估、重复：利用多模态 LLM 进行大规模产品检索评估

由于训练有素的人类注释者数量有限，对生产级检索系统进行大规模评估是一项至关重要但又极具挑战性的任务。大型语言模型（LLM）有可能解决这一规模化问题，并为大量注释任务提供可行的人工替代方案。在本文中，我们提出了一个在大规模电子商务环境中评估产品搜索引擎的框架，利用多模态 LLM (i) 生成针对单个查询的定制注释指南，(ii) 执行后续注释任务。我们的方法通过在大型电子商务平台上的部署得到了验证，其质量可与人工标注相媲美，大大减少了时间和成本，有利于快速发现问题，并为大规模生产级质量控制提供了有效的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Information Retrieval

自引率

0.00%

发文量