Optimizing document retrieval using massive text embeddings and LLM prompt engineering.

IF 3.9 4区医学 Q1 MEDICINE, GENERAL & INTERNAL

Systematic Reviews Pub Date : 2026-04-14 DOI:10.1186/s13643-026-03155-4

Goran Mitrov, Boris Stanoev, Vladimir Trajkovik, Biljana Risteska Stojkoska, Lasko Basnarkov, Petre Lameski, Martin Kampel, Eftim Zdravevski

{"title":"Optimizing document retrieval using massive text embeddings and LLM prompt engineering.","authors":"Goran Mitrov, Boris Stanoev, Vladimir Trajkovik, Biljana Risteska Stojkoska, Lasko Basnarkov, Petre Lameski, Martin Kampel, Eftim Zdravevski","doi":"10.1186/s13643-026-03155-4","DOIUrl":null,"url":null,"abstract":"Background: The rapid expansion of digital data poses a unique challenge for retrieving relevant and insightful information efficiently. In particular, the increasing volume of scientific publications has made literature reviews time-consuming. The emergence of large language models (LLMs) offers new opportunities to streamline this process.Methods: This paper explores the use of generative artificial intelligence (GenAI) for query reformulation and evaluates the performance of nine massive text embedding models, varying in size and fine-tuning strategies, in the context of document retrieval. We apply multiple prompt engineering techniques to evaluate the ability of LLMs to generate effective queries, comparing them with human-crafted queries. These are used to retrieve documents utilizing nine embedding models. The evaluation is across five datasets using metrics such as recall, average precision, and rank-based measures.Results: Results show that embedding models fine-tuned for semantic similarity consistently outperform general-purpose models, with UAE Large proving most robust across diverse domains. Furthermore, queries generated using zero-shot and few-shot prompting techniques often surpass the performance of human-formulated queries.Conclusion: These findings highlight the value of integrating LLMs and massive text embeddings to reduce manual effort in literature reviews. GenAI provides a reliable starting point for query formulation, with human input reserved for refinement when needed.","PeriodicalId":22162,"journal":{"name":"Systematic Reviews","volume":" ","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2026-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systematic Reviews","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13643-026-03155-4","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}

引用次数: 0

Abstract

Background: The rapid expansion of digital data poses a unique challenge for retrieving relevant and insightful information efficiently. In particular, the increasing volume of scientific publications has made literature reviews time-consuming. The emergence of large language models (LLMs) offers new opportunities to streamline this process.

Methods: This paper explores the use of generative artificial intelligence (GenAI) for query reformulation and evaluates the performance of nine massive text embedding models, varying in size and fine-tuning strategies, in the context of document retrieval. We apply multiple prompt engineering techniques to evaluate the ability of LLMs to generate effective queries, comparing them with human-crafted queries. These are used to retrieve documents utilizing nine embedding models. The evaluation is across five datasets using metrics such as recall, average precision, and rank-based measures.

Results: Results show that embedding models fine-tuned for semantic similarity consistently outperform general-purpose models, with UAE Large proving most robust across diverse domains. Furthermore, queries generated using zero-shot and few-shot prompting techniques often surpass the performance of human-formulated queries.

Conclusion: These findings highlight the value of integrating LLMs and massive text embeddings to reduce manual effort in literature reviews. GenAI provides a reliable starting point for query formulation, with human input reserved for refinement when needed.

查看原文本刊更多论文

使用大量文本嵌入和LLM提示工程优化文档检索。

背景：数字数据的快速扩展对有效检索相关和有洞察力的信息提出了独特的挑战。特别是，越来越多的科学出版物使得文献综述变得非常耗时。大型语言模型（llm）的出现为简化这一过程提供了新的机会。方法：本文探讨了生成式人工智能（GenAI）在查询重构中的应用，并评估了9种不同大小和微调策略的大规模文本嵌入模型在文档检索中的性能。我们应用多种提示工程技术来评估llm生成有效查询的能力，并将其与人工制作的查询进行比较。它们用于使用9个嵌入模型检索文档。评估是跨五个数据集使用指标，如召回率，平均精度和基于排名的措施。结果：结果表明，针对语义相似度进行微调的嵌入模型始终优于通用模型，其中阿联酋大型模型在不同领域被证明是最健壮的。此外，使用零镜头和少镜头提示技术生成的查询通常会超过人工制定的查询的性能。结论：这些发现突出了整合法学硕士和大量文本嵌入的价值，以减少文献综述的人工工作量。GenAI为查询公式提供了一个可靠的起点，在需要时保留人工输入以进行细化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Systematic Reviews Medicine-Medicine (miscellaneous)

CiteScore

8.30

自引率

0.00%

发文量

241

审稿时长

11 weeks

期刊介绍： Systematic Reviews encompasses all aspects of the design, conduct and reporting of systematic reviews. The journal publishes high quality systematic review products including systematic review protocols, systematic reviews related to a very broad definition of health, rapid reviews, updates of already completed systematic reviews, and methods research related to the science of systematic reviews, such as decision modelling. At this time Systematic Reviews does not accept reviews of in vitro studies. The journal also aims to ensure that the results of all well-conducted systematic reviews are published, regardless of their outcome.