High-throughput biomedical relation extraction for semi-structured web articles empowered by large language models.

IF 3.8 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making Pub Date : 2025-09-29 DOI:10.1186/s12911-025-03204-3

Songchi Zhou, Sheng Yu

{"title":"High-throughput biomedical relation extraction for semi-structured web articles empowered by large language models.","authors":"Songchi Zhou, Sheng Yu","doi":"10.1186/s12911-025-03204-3","DOIUrl":null,"url":null,"abstract":"Background: We aim to develop a high-throughput biomedical relation extraction system tailored for semi-structured biomedical websites, leveraging the reading comprehension abilities and domain-specific medical knowledge of large language models (LLMs).Methods: We formulate relation extraction as a series of binary classification problems. Given the context from semi-structured biomedical web articles, LLMs decide whether a relation holds while providing accompanying rationales for factual verification. The article's main title is designated as the tail entity, and candidate head entities are identified by matching against a biomedical thesaurus with semantic typing to guide candidate relation types. To assess system performance and robustness, we compare general-purpose, domain-adapted, and parameter-efficient LLMs on an expert-curated benchmark, evaluating their relative effectiveness in extracting relations from semi-structured biomedical websites.Results: Domain-adapted models consistently outperform their general-purpose counterparts. Specifically, MedGemma-27B achieves an F1 score of 0.820 and Cohen's Kappa of 0.677, representing clear improvements over its base model Gemma3-27B (F1 = 0.771, Kappa = 0.604). Notably, MedGemma-27B also surpasses OpenAI's GPT-4o (F1 = 0.708, Kappa = 0.561) and GPT-4.1 (F1 = 0.732, Kappa = 0.597), demonstrating the advantage of biomedical domain adaptation even over stronger proprietary models. Among all evaluated models, DeepSeek-V3 yields the best overall performance (F1 = 0.844, Kappa = 0.730). Using MedGemma-27B, we extracted 225,799 relation triplets across three relation types from three authoritative biomedical websites. Case studies further highlight both the strengths and persistent challenges of different LLM classes in biomedical relation extraction from semi-structured content.Conclusion: Our study demonstrates that LLMs can serve as effective engines for high-throughput biomedical relation extraction, with domain-adapted and parameter-efficient models offering practical advantages. The framework is scalable and broadly adaptable, enabling efficient extraction of diverse biomedical relations across heterogeneous semi-structured websites. Beyond technical performance, the ability to extract reliable biomedical relations at scale can directly benefit clinical applications, such as enriching biomedical knowledge graphs, supporting evidence-based guideline development, and ultimately assisting clinicians in accessing structured medical knowledge for decision-making.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"351"},"PeriodicalIF":3.8000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12482089/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-03204-3","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: We aim to develop a high-throughput biomedical relation extraction system tailored for semi-structured biomedical websites, leveraging the reading comprehension abilities and domain-specific medical knowledge of large language models (LLMs).

Methods: We formulate relation extraction as a series of binary classification problems. Given the context from semi-structured biomedical web articles, LLMs decide whether a relation holds while providing accompanying rationales for factual verification. The article's main title is designated as the tail entity, and candidate head entities are identified by matching against a biomedical thesaurus with semantic typing to guide candidate relation types. To assess system performance and robustness, we compare general-purpose, domain-adapted, and parameter-efficient LLMs on an expert-curated benchmark, evaluating their relative effectiveness in extracting relations from semi-structured biomedical websites.

Results: Domain-adapted models consistently outperform their general-purpose counterparts. Specifically, MedGemma-27B achieves an F1 score of 0.820 and Cohen's Kappa of 0.677, representing clear improvements over its base model Gemma3-27B (F1 = 0.771, Kappa = 0.604). Notably, MedGemma-27B also surpasses OpenAI's GPT-4o (F1 = 0.708, Kappa = 0.561) and GPT-4.1 (F1 = 0.732, Kappa = 0.597), demonstrating the advantage of biomedical domain adaptation even over stronger proprietary models. Among all evaluated models, DeepSeek-V3 yields the best overall performance (F1 = 0.844, Kappa = 0.730). Using MedGemma-27B, we extracted 225,799 relation triplets across three relation types from three authoritative biomedical websites. Case studies further highlight both the strengths and persistent challenges of different LLM classes in biomedical relation extraction from semi-structured content.

Conclusion: Our study demonstrates that LLMs can serve as effective engines for high-throughput biomedical relation extraction, with domain-adapted and parameter-efficient models offering practical advantages. The framework is scalable and broadly adaptable, enabling efficient extraction of diverse biomedical relations across heterogeneous semi-structured websites. Beyond technical performance, the ability to extract reliable biomedical relations at scale can directly benefit clinical applications, such as enriching biomedical knowledge graphs, supporting evidence-based guideline development, and ultimately assisting clinicians in accessing structured medical knowledge for decision-making.

Abstract Image

查看原文本刊更多论文

基于大型语言模型的半结构化web文章的高通量生物医学关系提取。

背景：我们的目标是利用大型语言模型（llm）的阅读理解能力和特定领域的医学知识，为半结构化生物医学网站开发一个高通量生物医学关系提取系统。方法：将关系抽取表述为一系列二值分类问题。考虑到半结构化生物医学网络文章的背景，法学硕士决定一种关系是否成立，同时为事实验证提供伴随的理由。文章的主标题被指定为尾部实体，候选头部实体通过与带有语义类型指导候选关系类型的生物医学同义词库进行匹配来识别。为了评估系统性能和鲁棒性，我们在专家策划的基准上比较了通用、领域适应和参数高效的llm，评估了它们在从半结构化生物医学网站提取关系方面的相对有效性。结果：领域适应模型始终优于通用模型。其中，MedGemma-27B的F1得分为0.820,Cohen’s Kappa为0.677，较其基础模型Gemma3-27B （F1 = 0.771, Kappa = 0.604）有明显改进。值得注意的是，MedGemma-27B还超过了OpenAI的gpt - 40 （F1 = 0.708, Kappa = 0.561）和GPT-4.1 (F1 = 0.732, Kappa = 0.597)，显示了生物医学领域适应的优势，甚至超过了更强大的专有模型。在所有被评估的模型中，DeepSeek-V3的综合性能最好（F1 = 0.844, Kappa = 0.730）。使用MedGemma-27B，我们从三个权威的生物医学网站中提取了三种关系类型的225,799个关系三元组。案例研究进一步强调了从半结构化内容中提取生物医学关系的不同LLM课程的优势和持续挑战。结论：我们的研究表明，llm可以作为高通量生物医学关系提取的有效引擎，具有领域适应和参数高效的模型具有实用优势。该框架具有可扩展性和广泛的适应性，能够在异构半结构化网站中有效地提取各种生物医学关系。除了技术性能之外，大规模提取可靠的生物医学关系的能力可以直接有益于临床应用，例如丰富生物医学知识图谱，支持循证指南的制定，并最终帮助临床医生获取结构化的医学知识以进行决策。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Medical Informatics and Decision Making 医学-医学：信息

CiteScore

7.20

自引率

5.70%

发文量

297

审稿时长

1 months

期刊介绍： BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.