LLM-as-a-Judge: automated evaluation of search query parsing using large language models.

IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Frontiers in Big Data Pub Date : 2025-07-21 eCollection Date: 2025-01-01 DOI:10.3389/fdata.2025.1611389
Mehmet Selman Baysan, Serkan Uysal, İrem İşlek, Çağla Çığ Karaman, Tunga Güngör
{"title":"LLM-as-a-Judge: automated evaluation of search query parsing using large language models.","authors":"Mehmet Selman Baysan, Serkan Uysal, İrem İşlek, Çağla Çığ Karaman, Tunga Güngör","doi":"10.3389/fdata.2025.1611389","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>The adoption of Large Language Models (LLMs) in search systems necessitates new evaluation methodologies beyond traditional rule-based or manual approaches.</p><p><strong>Methods: </strong>We propose a general framework for evaluating structured outputs using LLMs, focusing on search query parsing within an online classified platform. Our approach leverages LLMs' contextual reasoning capabilities through three evaluation methodologies: Pointwise, Pairwise, and Pass/Fail assessments. Additionally, we introduce a Contextual Evaluation Prompt Routing strategy to improve reliability and reduce hallucinations.</p><p><strong>Results: </strong>Experiments conducted on both small- and large-scale datasets demonstrate that LLM-based evaluation achieves approximately 90% agreement with human judgments.</p><p><strong>Discussion: </strong>These results validate LLM-driven evaluation as a scalable, interpretable, and effective alternative to traditional evaluation methods, providing robust query parsing for real-world search systems.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1611389"},"PeriodicalIF":2.4000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12319771/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Big Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fdata.2025.1611389","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: The adoption of Large Language Models (LLMs) in search systems necessitates new evaluation methodologies beyond traditional rule-based or manual approaches.

Methods: We propose a general framework for evaluating structured outputs using LLMs, focusing on search query parsing within an online classified platform. Our approach leverages LLMs' contextual reasoning capabilities through three evaluation methodologies: Pointwise, Pairwise, and Pass/Fail assessments. Additionally, we introduce a Contextual Evaluation Prompt Routing strategy to improve reliability and reduce hallucinations.

Results: Experiments conducted on both small- and large-scale datasets demonstrate that LLM-based evaluation achieves approximately 90% agreement with human judgments.

Discussion: These results validate LLM-driven evaluation as a scalable, interpretable, and effective alternative to traditional evaluation methods, providing robust query parsing for real-world search systems.

Abstract Image

Abstract Image

Abstract Image

LLM-as-a-Judge:使用大型语言模型自动评估搜索查询解析。
引言:在搜索系统中采用大型语言模型(llm)需要新的评估方法,而不是传统的基于规则或手动方法。方法:我们提出了一个使用llm评估结构化输出的通用框架,重点关注在线分类平台内的搜索查询解析。我们的方法通过三种评估方法利用法学硕士的上下文推理能力:点对评估、两两评估和通过/不通过评估。此外,我们引入了上下文评估提示路由策略,以提高可靠性和减少幻觉。结果:在小型和大型数据集上进行的实验表明,基于llm的评估与人类判断的一致性约为90%。讨论:这些结果验证了llm驱动的评估是传统评估方法的可伸缩、可解释和有效的替代方法,为现实世界的搜索系统提供了健壮的查询解析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.20
自引率
3.20%
发文量
122
审稿时长
13 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信