Optimizing document retrieval using massive text embeddings and LLM prompt engineering.

IF 3.9 4区 医学 Q1 MEDICINE, GENERAL & INTERNAL
Goran Mitrov, Boris Stanoev, Vladimir Trajkovik, Biljana Risteska Stojkoska, Lasko Basnarkov, Petre Lameski, Martin Kampel, Eftim Zdravevski
{"title":"Optimizing document retrieval using massive text embeddings and LLM prompt engineering.","authors":"Goran Mitrov, Boris Stanoev, Vladimir Trajkovik, Biljana Risteska Stojkoska, Lasko Basnarkov, Petre Lameski, Martin Kampel, Eftim Zdravevski","doi":"10.1186/s13643-026-03155-4","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The rapid expansion of digital data poses a unique challenge for retrieving relevant and insightful information efficiently. In particular, the increasing volume of scientific publications has made literature reviews time-consuming. The emergence of large language models (LLMs) offers new opportunities to streamline this process.</p><p><strong>Methods: </strong>This paper explores the use of generative artificial intelligence (GenAI) for query reformulation and evaluates the performance of nine massive text embedding models, varying in size and fine-tuning strategies, in the context of document retrieval. We apply multiple prompt engineering techniques to evaluate the ability of LLMs to generate effective queries, comparing them with human-crafted queries. These are used to retrieve documents utilizing nine embedding models. The evaluation is across five datasets using metrics such as recall, average precision, and rank-based measures.</p><p><strong>Results: </strong>Results show that embedding models fine-tuned for semantic similarity consistently outperform general-purpose models, with UAE Large proving most robust across diverse domains. Furthermore, queries generated using zero-shot and few-shot prompting techniques often surpass the performance of human-formulated queries.</p><p><strong>Conclusion: </strong>These findings highlight the value of integrating LLMs and massive text embeddings to reduce manual effort in literature reviews. GenAI provides a reliable starting point for query formulation, with human input reserved for refinement when needed.</p>","PeriodicalId":22162,"journal":{"name":"Systematic Reviews","volume":" ","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2026-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systematic Reviews","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13643-026-03155-4","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

Abstract

Background: The rapid expansion of digital data poses a unique challenge for retrieving relevant and insightful information efficiently. In particular, the increasing volume of scientific publications has made literature reviews time-consuming. The emergence of large language models (LLMs) offers new opportunities to streamline this process.

Methods: This paper explores the use of generative artificial intelligence (GenAI) for query reformulation and evaluates the performance of nine massive text embedding models, varying in size and fine-tuning strategies, in the context of document retrieval. We apply multiple prompt engineering techniques to evaluate the ability of LLMs to generate effective queries, comparing them with human-crafted queries. These are used to retrieve documents utilizing nine embedding models. The evaluation is across five datasets using metrics such as recall, average precision, and rank-based measures.

Results: Results show that embedding models fine-tuned for semantic similarity consistently outperform general-purpose models, with UAE Large proving most robust across diverse domains. Furthermore, queries generated using zero-shot and few-shot prompting techniques often surpass the performance of human-formulated queries.

Conclusion: These findings highlight the value of integrating LLMs and massive text embeddings to reduce manual effort in literature reviews. GenAI provides a reliable starting point for query formulation, with human input reserved for refinement when needed.

使用大量文本嵌入和LLM提示工程优化文档检索。
背景:数字数据的快速扩展对有效检索相关和有洞察力的信息提出了独特的挑战。特别是,越来越多的科学出版物使得文献综述变得非常耗时。大型语言模型(llm)的出现为简化这一过程提供了新的机会。方法:本文探讨了生成式人工智能(GenAI)在查询重构中的应用,并评估了9种不同大小和微调策略的大规模文本嵌入模型在文档检索中的性能。我们应用多种提示工程技术来评估llm生成有效查询的能力,并将其与人工制作的查询进行比较。它们用于使用9个嵌入模型检索文档。评估是跨五个数据集使用指标,如召回率,平均精度和基于排名的措施。结果:结果表明,针对语义相似度进行微调的嵌入模型始终优于通用模型,其中阿联酋大型模型在不同领域被证明是最健壮的。此外,使用零镜头和少镜头提示技术生成的查询通常会超过人工制定的查询的性能。结论:这些发现突出了整合法学硕士和大量文本嵌入的价值,以减少文献综述的人工工作量。GenAI为查询公式提供了一个可靠的起点,在需要时保留人工输入以进行细化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Systematic Reviews
Systematic Reviews Medicine-Medicine (miscellaneous)
CiteScore
8.30
自引率
0.00%
发文量
241
审稿时长
11 weeks
期刊介绍: Systematic Reviews encompasses all aspects of the design, conduct and reporting of systematic reviews. The journal publishes high quality systematic review products including systematic review protocols, systematic reviews related to a very broad definition of health, rapid reviews, updates of already completed systematic reviews, and methods research related to the science of systematic reviews, such as decision modelling. At this time Systematic Reviews does not accept reviews of in vitro studies. The journal also aims to ensure that the results of all well-conducted systematic reviews are published, regardless of their outcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书