LITERAS: Biomedical literature review and citation retrieval agents

IF 6.3 2区医学 Q1 BIOLOGY

Computers in biology and medicine Pub Date : 2025-05-17 DOI:10.1016/j.compbiomed.2025.110363

Alon Gorenshtein , Kamel Shihada , Moran Sorka , Dvir Aran , Shahar Shelly

{"title":"LITERAS: Biomedical literature review and citation retrieval agents","authors":"Alon Gorenshtein , Kamel Shihada , Moran Sorka , Dvir Aran , Shahar Shelly","doi":"10.1016/j.compbiomed.2025.110363","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Existing tools for reference retrieval using large language models (LLMs) frequently generate inaccurate, gray literature or fabricated citations, leading to poor accuracy. In this study, we aim to address this gap by developing a highly accurate reference retrieval system focusing on the precision and reliability of citations across five medical fields.</div></div><div><h3>Methods</h3><div>We developed open-source multi-AI agent, literature review, and citation retrieval agents (LITERAS) designed to generate literature review drafts with accurate and confirmable citations. LITERAS integrates a search through the largest biomedical literature database (MEDLINE) via PubMed's application programming interface and bidirectional inter-agent communication to enhance citation accuracy and reliability. To evaluate its performance, we compared LITERAS to state-of-the-art LLMs, Sonar and Sonar-Pro by Perplexity AI. The evaluation covered five distinct medical disciplines Oncology, Cardiology, Rheumatology, Psychiatry, and Infectious Diseases/Public Health, focusing on the credibility, precision, and confirmation of citations, as well as the overall quality of the referenced sources.</div></div><div><h3>Results</h3><div>LITERAS achieved near‐perfect <em>citation accuracy</em> (i.e., whether references match real publications) at 99.82 %, statistically indistinguishable from Sonar (100.00 %, p = 0.065) and Sonar-Pro (99.93 %, p = 0.074). When focusing on <em>referencing accuracy (</em>the consistency between in‐text citation details and metadata), LITERAS (96.81 %) significantly outperformed Sonar (89.07 %, p < 0.001) and matched Sonar-Pro (96.33 %, p = 0.139). Notably, LITERAS exclusively relied on Q1–Q2, peer‐reviewed journals (0 % nonacademic content), whereas Sonar contained 35.60 % (p < 0.01) nonacademic sources, and Sonar-Pro used 6.47 % (p < 0.001). However, Sonar-Pro cited higher‐impact journals than LITERAS (median impact factor (IF) = 14.70 vs LITERAS 3.70, p < 0.001). LITERAS's multi‐agent loop (2.2 ± 1.34 iterations per query) minimized hallucinations and consistently prioritized recent articles (IQR = 2023–2024). The field-specific analysis demonstrated the oncology field with the largest IF discrepancies (Sonar-Pro 42.1 vs LITERAS 4.3, <em>p</em> < 0.001), reflecting Sonar-Pro's preference for major consortium guidelines and high‐impact meta‐analyses.</div></div><div><h3>Conclusion</h3><div>LITERAS demonstrated significantly higher retrieval of recent academic journal articles and generating longer summary report compared to academic search LLMs approaches in literature review tasks. This work provides insights into improving the reliability of AI-assisted literature review systems.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"192 ","pages":"Article 110363"},"PeriodicalIF":6.3000,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525007140","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Existing tools for reference retrieval using large language models (LLMs) frequently generate inaccurate, gray literature or fabricated citations, leading to poor accuracy. In this study, we aim to address this gap by developing a highly accurate reference retrieval system focusing on the precision and reliability of citations across five medical fields.

Methods

We developed open-source multi-AI agent, literature review, and citation retrieval agents (LITERAS) designed to generate literature review drafts with accurate and confirmable citations. LITERAS integrates a search through the largest biomedical literature database (MEDLINE) via PubMed's application programming interface and bidirectional inter-agent communication to enhance citation accuracy and reliability. To evaluate its performance, we compared LITERAS to state-of-the-art LLMs, Sonar and Sonar-Pro by Perplexity AI. The evaluation covered five distinct medical disciplines Oncology, Cardiology, Rheumatology, Psychiatry, and Infectious Diseases/Public Health, focusing on the credibility, precision, and confirmation of citations, as well as the overall quality of the referenced sources.

Results

LITERAS achieved near‐perfect citation accuracy (i.e., whether references match real publications) at 99.82 %, statistically indistinguishable from Sonar (100.00 %, p = 0.065) and Sonar-Pro (99.93 %, p = 0.074). When focusing on referencing accuracy (the consistency between in‐text citation details and metadata), LITERAS (96.81 %) significantly outperformed Sonar (89.07 %, p < 0.001) and matched Sonar-Pro (96.33 %, p = 0.139). Notably, LITERAS exclusively relied on Q1–Q2, peer‐reviewed journals (0 % nonacademic content), whereas Sonar contained 35.60 % (p < 0.01) nonacademic sources, and Sonar-Pro used 6.47 % (p < 0.001). However, Sonar-Pro cited higher‐impact journals than LITERAS (median impact factor (IF) = 14.70 vs LITERAS 3.70, p < 0.001). LITERAS's multi‐agent loop (2.2 ± 1.34 iterations per query) minimized hallucinations and consistently prioritized recent articles (IQR = 2023–2024). The field-specific analysis demonstrated the oncology field with the largest IF discrepancies (Sonar-Pro 42.1 vs LITERAS 4.3, p < 0.001), reflecting Sonar-Pro's preference for major consortium guidelines and high‐impact meta‐analyses.

Conclusion

LITERAS demonstrated significantly higher retrieval of recent academic journal articles and generating longer summary report compared to academic search LLMs approaches in literature review tasks. This work provides insights into improving the reliability of AI-assisted literature review systems.

查看原文本刊更多论文

LITERAS：生物医学文献综述和引文检索代理

使用大型语言模型（llm）的现有参考检索工具经常产生不准确的灰色文献或捏造的引用，导致准确性差。在这项研究中，我们的目标是通过开发一个高度精确的参考文献检索系统来解决这一差距，该系统侧重于五个医学领域的引文的准确性和可靠性。方法我们开发了开源的多人工智能代理、文献综述和引文检索代理（LITERAS），旨在生成具有准确和可确认引文的文献综述草稿。LITERAS通过PubMed的应用程序编程接口和双向代理间通信集成了最大的生物医学文献数据库（MEDLINE）的搜索，以提高引文的准确性和可靠性。为了评估其性能，我们将LITERAS与最先进的llm、Sonar和Sonar- pro进行了比较。评估涵盖了肿瘤学、心脏病学、风湿病学、精神病学和传染病/公共卫生五个不同的医学学科，重点是引用的可信度、准确性和确认性，以及参考来源的整体质量。结果literas达到了接近完美的引用准确性（即参考文献是否与真实出版物匹配），达到99.82%，在统计上与Sonar （100.00%, p = 0.065）和Sonar- pro （99.93%, p = 0.074）没有区别。当关注引用准确性（文本引用细节与元数据之间的一致性）时，LITERAS（96.81%）显著优于Sonar (89.07%), p <；0.001)和匹配的Sonar-Pro （96.33%, p = 0.139）。值得注意的是，LITERAS完全依赖于第一季度至第二季度的同行评议期刊（0%非学术内容），而Sonar包含35.60% (p <；0.01)非学术来源，Sonar-Pro使用6.47% (p <；0.001)。然而，Sonar-Pro引用的高影响力期刊比LITERAS(中位影响因子(IF) = 14.70 vs LITERAS 3.70, p <；0.001)。LITERAS的多智能体循环（每次查询2.2±1.34次迭代）最大限度地减少了幻觉，并始终优先考虑最近的文章（IQR = 2023-2024）。领域特异性分析显示肿瘤领域的IF差异最大(Sonar-Pro 42.1 vs LITERAS 4.3, p <；0.001)，反映了Sonar-Pro对主要联盟指南和高影响meta分析的偏好。结论在文献综述任务中，与学术搜索LLMs方法相比，literas对近期学术期刊文章的检索能力显著提高，生成的摘要报告也更长。这项工作为提高人工智能辅助文献综述系统的可靠性提供了见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers in biology and medicine 工程技术-工程：生物医学

CiteScore

11.70

自引率

10.40%

发文量

1086

审稿时长

74 days

期刊介绍： Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.