Journal of Biomedical Informatics最新文献

筛选
英文 中文
Accounting for population structure in deep learning models for genomic analysis 在基因组分析的深度学习模型中考虑种群结构。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-07-05 DOI: 10.1016/j.jbi.2025.104873
Gabrielle Dagasso , Matthias Wilms , Raissa Souza , Nils D. Forkert
{"title":"Accounting for population structure in deep learning models for genomic analysis","authors":"Gabrielle Dagasso ,&nbsp;Matthias Wilms ,&nbsp;Raissa Souza ,&nbsp;Nils D. Forkert","doi":"10.1016/j.jbi.2025.104873","DOIUrl":"10.1016/j.jbi.2025.104873","url":null,"abstract":"<div><h3>Background</h3><div>Deep learning methods are becoming increasingly popular for genotype analyses in recent years. In conventional genomic analyses, it is important to account for confounders to avoid biasing the results. Genetic relatedness is one of the most common confounders in conventional genomic analyses and there is a general consensus that it should be considered in the analysis to prevent distant levels of common ancestry from affecting the identification of causal variants. In contrast, genetic relatedness is not considered or ignored in many of the recently published deep learning models.</div></div><div><h3>Objective</h3><div>This study investigates whether the omission of genetic relatedness in deep learning models, common in recent literature, introduces confounding effects similar to those observed in conventional genomic analyses, particularly due to ancestry-related variants.</div></div><div><h3>Methods</h3><div>We developed and used a deep learning model to perform classifications based on single nucleotide polymorphism data from simulated and real-world datasets to examine whether population structure is confounding the model and potentially causing shortcut learning.</div></div><div><h3>Results</h3><div>The results of this study suggest that population structure may not significantly affect the performance of the deep learning model. However, explainable AI revealed notable differences in the focus between the confounded and unconfounded models when examining SNP feature importance.</div></div><div><h3>Conclusion</h3><div>While population structure may not heavily affect model performance, it is important to reduce the models’ capabilities of shortcut learning when designing deep learning models for analyzing genomic datasets, by using ancestry-related variants over potentially relevant biomarkers of the disease or disorder in question. The code used to perform these analyses can be found at: https://github.com/notTrivial/populationStructure.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"169 ","pages":"Article 104873"},"PeriodicalIF":4.0,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144584024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging heterogeneous tabular of EHRs with prompt learning for clinical prediction 利用异质表格的电子病历与快速学习临床预测
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-07-04 DOI: 10.1016/j.jbi.2025.104868
Xuebing Yang , Longyu Li , Chutong Wang , Wensheng Zhang , Huizhou Liu , Wen Tang
{"title":"Leveraging heterogeneous tabular of EHRs with prompt learning for clinical prediction","authors":"Xuebing Yang ,&nbsp;Longyu Li ,&nbsp;Chutong Wang ,&nbsp;Wensheng Zhang ,&nbsp;Huizhou Liu ,&nbsp;Wen Tang","doi":"10.1016/j.jbi.2025.104868","DOIUrl":"10.1016/j.jbi.2025.104868","url":null,"abstract":"<div><div>Electronic Health Records (EHRs) depict patient-related information and have significantly contributed to advancements in healthcare fields. The abundance of EHR data provides exceptional opportunities for developing clinical predictive models. However, the heterogeneity within multi-source EHR data raises a difficulty to organically leverage information from structured and unstructured features. In this paper, we focus on the heterogeneous EHR data in the tabular form, and propose a Prompt learning based data Fusion framework for Tabular (TabPF) to extract patient representations for clinical prediction. First, we design a text summary generator module to convert medical tabular into vector representations through long text embedding. Specifically, the tailored prompt learning is conducted for leading the Large Language Model (LLM) to respectively generate appropriate text summaries for different types of tabular data. Second, we design a novel attention mechanism of Transformer to effectively realize heterogeneous data fusion and generate more comprehensive patient representations for downstream predictions. The experiments are performed on the publicly available eICU-CRD dataset and the real-world CECMed dataset containing elderly patients diagnosed with chronic diseases, in comparison with representative baseline models. The results validate the superior performance of TabPF in predicting severity, mortality and Length of Stay (LoS). Furthermore, extensive ablation study and model variants evaluations demonstrate the effectiveness of the key component of the proposed framework.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104868"},"PeriodicalIF":4.0,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144557652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How well do multimodal LLMs interpret CT scans? An auto-evaluation framework for analyses 多模态llm如何解释CT扫描?用于分析的自动评估框架
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-06-25 DOI: 10.1016/j.jbi.2025.104864
Qingqing Zhu , Benjamin Hou , Tejas Sudarshan Mathai , Pritam Mukherjee , Qiao Jin , Xiuying Chen , Zhizheng Wang , Ruida Cheng , Ronald M. Summers , Zhiyong Lu
{"title":"How well do multimodal LLMs interpret CT scans? An auto-evaluation framework for analyses","authors":"Qingqing Zhu ,&nbsp;Benjamin Hou ,&nbsp;Tejas Sudarshan Mathai ,&nbsp;Pritam Mukherjee ,&nbsp;Qiao Jin ,&nbsp;Xiuying Chen ,&nbsp;Zhizheng Wang ,&nbsp;Ruida Cheng ,&nbsp;Ronald M. Summers ,&nbsp;Zhiyong Lu","doi":"10.1016/j.jbi.2025.104864","DOIUrl":"10.1016/j.jbi.2025.104864","url":null,"abstract":"<div><h3>Objective:</h3><div>This study introduces a novel evaluation framework, <em>GPTRadScore</em>, to systematically assess the performance of multimodal large language models (MLLMs) in generating clinically accurate findings from CT imaging. Specifically, GPTRadScore leverages LLMs as an evaluation metric, aiming to provide a more accurate and clinically informed assessment than traditional language-specific methods. Using this framework, we evaluate the capability of several MLLMs, including GPT-4 with Vision (GPT-4V), Gemini Pro Vision, LLaVA-Med, and RadFM, to interpret findings in CT scans.</div></div><div><h3>Methods:</h3><div>This retrospective study leverages a subset of the public DeepLesion dataset to evaluate the performance of several multimodal LLMs in describing findings in CT slices. <em>GPTRadScore</em> was developed to assess the generated descriptions (location, body part, and type) using GPT-4, alongside traditional metrics. RadFM was fine-tuned using a subset of the DeepLesion dataset with additional labeled examples targeting complex findings. Post fine-tuning, performance was reassessed using <em>GPTRadScore</em> to measure accuracy improvements.</div></div><div><h3>Results:</h3><div>Evaluations demonstrated a high correlation of <em>GPTRadScore</em> with clinician assessments, with Pearson’s correlation coefficients of 0.87, 0.91, 0.75, 0.90, and 0.89. These results highlight its superiority over traditional metrics, such as BLEU, METEOR, and ROUGE, and indicate that GPTRadScore can serve as a reliable evaluation metric. Using <em>GPTRadScore</em>, it was observed that while GPT-4V and Gemini Pro Vision outperformed other models, significant areas for improvement remain, primarily due to limitations in the datasets used for training. Fine-tuning RadFM resulted in substantial accuracy gains: location accuracy increased from 3.41% to 12.8%, body part accuracy improved from 29.12% to 53%, and type accuracy rose from 9.24% to 30%. These findings reinforce the hypothesis that fine-tuning RadFM can significantly enhance its performance.</div></div><div><h3>Conclusion:</h3><div>GPT-4 effectively correlates with expert assessments, validating its use as a reliable metric for evaluating multimodal LLMs in radiological diagnostics. Additionally, the results underscore the efficacy of fine-tuning approaches in improving the descriptive accuracy of LLM-generated medical imaging findings.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104864"},"PeriodicalIF":4.0,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144511103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparative study of recent large language models on generating hospital discharge summaries for lung cancer patients 近期大型语言模型在肺癌患者出院摘要生成中的比较研究。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-06-20 DOI: 10.1016/j.jbi.2025.104867
Yiming Li , Fang Li , Na Hong , Manqi Li , Kirk Roberts , Licong Cui , Cui Tao , Hua Xu
{"title":"A comparative study of recent large language models on generating hospital discharge summaries for lung cancer patients","authors":"Yiming Li ,&nbsp;Fang Li ,&nbsp;Na Hong ,&nbsp;Manqi Li ,&nbsp;Kirk Roberts ,&nbsp;Licong Cui ,&nbsp;Cui Tao ,&nbsp;Hua Xu","doi":"10.1016/j.jbi.2025.104867","DOIUrl":"10.1016/j.jbi.2025.104867","url":null,"abstract":"<div><h3>Objective</h3><div>Generating discharge summaries is a crucial yet time-consuming task in clinical practice, essential for conveying pertinent patient information and facilitating continuity of care. Recent advancements in large language models (LLMs) have significantly enhanced their capability in understanding and summarizing complex medical texts. This research aims to explore how LLMs can alleviate the burden of manual summarization, streamline workflow efficiencies, and support informed decision-making in healthcare settings.</div></div><div><h3>Materials and methods</h3><div>Clinical notes from a cohort of 1,099 lung cancer patients were utilized, with a subset of 50 patients for testing purposes, and 102 patients used for model fine-tuning. This study evaluates the performance of multiple LLMs, including GPT-3.5, GPT-4, GPT-4o, and LLaMA 3 8b, in generating discharge summaries. Evaluation metrics included token-level analysis (BLEU, ROUGE-1, ROUGE-2, ROUGE-L), semantic similarity scores, and manual evaluation of clinical relevance, factual faithfulness, and completeness. An iterative method was further tested on LLaMA 3 8b using clinical notes of varying lengths to examine the stability of its performance.</div></div><div><h3>Results</h3><div>The study found notable variations in summarization capabilities among LLMs. GPT-4o and fine-tuned LLaMA 3 demonstrated superior token-level evaluation metrics, while manual evaluation further revealed that GPT-4 achieved the highest scores in relevance (4.95 ± 0.22) and factual faithfulness (4.40 ± 0.50), whereas GPT-4o performed best in completeness (4.55 ± 0.69); both models showed comparable overall quality. Semantic similarity scores indicated GPT-4o and LLaMA 3 as leading models in capturing the underlying meaning and context of clinical narratives.</div></div><div><h3>Conclusion</h3><div>This study contributes insights into the efficacy of LLMs for generating discharge summaries, highlighting the potential of automated summarization tools to enhance documentation precision and efficiency, ultimately improving patient care and operational capability in healthcare settings.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104867"},"PeriodicalIF":4.0,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144368889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KGiA: Drug repurposing through disease-aware knowledge graph augmentation KGiA:通过疾病感知知识图谱增强实现药物再利用
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-06-20 DOI: 10.1016/j.jbi.2025.104857
Çerağ Oğuztüzün , Zhenxiang Gao , Hui Li , Rong Xu
{"title":"KGiA: Drug repurposing through disease-aware knowledge graph augmentation","authors":"Çerağ Oğuztüzün ,&nbsp;Zhenxiang Gao ,&nbsp;Hui Li ,&nbsp;Rong Xu","doi":"10.1016/j.jbi.2025.104857","DOIUrl":"10.1016/j.jbi.2025.104857","url":null,"abstract":"<div><h3>Objective:</h3><div>Drug repurposing offers a cost-effective strategy to accelerate drug development by identifying new therapeutic uses for approved medications. Knowledge graphs (KGs) that capture large amounts of biomedical knowledge have recently been used for drug repurposing, however, KGs are inherently incomplete due to our limited biomedical knowledge.</div></div><div><h3>Methods:</h3><div>We propose KGiA, an inductive graph augmentation method that supports semi-inductive reasoning—allowing models to generalize to previously unseen biomedical entities. KGiA enhances KGs using counterfactual relationships mined from disease-specific topological patterns. We apply it to a state-of-art biomedical KG constructed from six datasets including biomedical relationships extracted from biomedical literature, which comprised 1,614,801 triples and 100,563 entities, including 30,006 diseases.</div></div><div><h3>Results:</h3><div>Across five augmented architectures, KGiA improves generalizability by up to 24×<!--> <!-->in Mean Reciprocal Rank (MRR) and outperforms the state-of-the-art KG-based drug repurposing model by up to 32%. We applied KGiA in four case studies of diseases including Alzheimer’s Disease and showed its promise in identifying novel repurposed candidate drugs.</div></div><div><h3>Conclusion:</h3><div>We showed that leveraging counterfactual relationships derived from disease-specific graph structures to augment existing knowledge graphs improved performance in KG-based drug repurposing.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104857"},"PeriodicalIF":4.0,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144364743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ontology enrichment using a large language model: Applying lexical, semantic, and knowledge network-based similarity for concept placement 使用大型语言模型丰富本体:为概念放置应用基于词汇、语义和知识网络的相似性。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-06-19 DOI: 10.1016/j.jbi.2025.104865
Navya Martin Kollapally , James Geller , Vipina Kuttichi Keloth , Zhe He , Julia Xu
{"title":"Ontology enrichment using a large language model: Applying lexical, semantic, and knowledge network-based similarity for concept placement","authors":"Navya Martin Kollapally ,&nbsp;James Geller ,&nbsp;Vipina Kuttichi Keloth ,&nbsp;Zhe He ,&nbsp;Julia Xu","doi":"10.1016/j.jbi.2025.104865","DOIUrl":"10.1016/j.jbi.2025.104865","url":null,"abstract":"<div><h3>Objective</h3><div>Ontologies are essential for representing the knowledge of a domain. To make ontologies useful, they must encompass a comprehensive domain view. To achieve ontology enrichment, there is a need to discover new concepts to be added, either because they were missed in the first place, or the state-of-the-art has advanced to develop new real-world concepts. Our goal is to develop an automatic enrichment pipeline using a seed ontology, a Large Language Model (LLM), and source of text. The pipeline is applied to the domain of Social Determinants of Health (SDoH), using PubMed as a source of concepts. In this work, the applicability and effectiveness of the enrichment pipeline is demonstrated by extending the SDoH Ontology called SOHOv1, however our methodology could be used in other domains as well.</div></div><div><h3>Methods</h3><div>We first retrieved PubMed abstracts of candidate articles with existing SOHOv1 concepts as search terms. Next, we used GPT-4-1201 to extract semantic triples from the abstracts. We identified concepts from these triples utilizing lexical, semantic, and knowledge network-based filtering. We also compared the granularity of semantic triples extracted with our method to the triples in the SemMedDB (Semantic MEDLINE Database). The results were evaluated by human experts and standard ontology tools for checking consistency and semantic correctness.</div></div><div><h3>Results</h3><div>We expanded SOHOv1, which contained 173 concepts and 585 axioms, including 207 logical axioms to SOHOv2, which contains 572 concepts, 1,542 axioms, including 725 logical axioms. Our methods identified more concepts than those extracted from SemMedDB for the same task. While we have shown the feasibility of our approach for an SDoH ontology, the methodology is generalizable to other ontologies with an existing seed ontology and text corpus.</div></div><div><h3>Conclusions</h3><div>The contributions of this work are: Extracting semantic triples from PubMed abstracts using GPT-4-1201 utilizing <em>prompt chaining</em>; showing the superiority of triples from GPT-4-1201 over triples from SemMedDB for SDoH; using lexical and semantic similarity search techniques with knowledge network-based search to identify the concepts to be added to the ontology; confirming the quality of the new concepts with human experts.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104865"},"PeriodicalIF":4.0,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144340171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum to “Theory of trust and acceptance of artificial intelligence technology (TrAAIT): An instrument to assess clinician trust and acceptance of artificial intelligence” [J. Biomed. Inform. 148 (2023) 104550] 人工智能技术信任与接受理论(TrAAIT):一种评估临床医生对人工智能信任与接受程度的工具[J]。生物医学。通报。148(2023)104550]。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-06-16 DOI: 10.1016/j.jbi.2025.104863
Alexander F. Stevens , Pete Stetson
{"title":"Corrigendum to “Theory of trust and acceptance of artificial intelligence technology (TrAAIT): An instrument to assess clinician trust and acceptance of artificial intelligence” [J. Biomed. Inform. 148 (2023) 104550]","authors":"Alexander F. Stevens ,&nbsp;Pete Stetson","doi":"10.1016/j.jbi.2025.104863","DOIUrl":"10.1016/j.jbi.2025.104863","url":null,"abstract":"","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104863"},"PeriodicalIF":4.0,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144317028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A declarative approach for interactive process discovery in the clinical domain 临床领域交互式过程发现的声明性方法。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-06-14 DOI: 10.1016/j.jbi.2025.104862
Carlos Fernández-Llatas , Begoña Martínez-Salvador , Mar Marcos
{"title":"A declarative approach for interactive process discovery in the clinical domain","authors":"Carlos Fernández-Llatas ,&nbsp;Begoña Martínez-Salvador ,&nbsp;Mar Marcos","doi":"10.1016/j.jbi.2025.104862","DOIUrl":"10.1016/j.jbi.2025.104862","url":null,"abstract":"<div><h3>Objective:</h3><div>Process Mining (PM) is an established discipline with increasing adoption in the clinical domain. In this context, PM seeks to infer clinical processes from healthcare data collected in the Electronic Health Record. However, the particularities of clinical practice cause that, in most cases, the processes obtained result in an intricate network that hardly corresponds to clinical algorithms and, thus, are difficult to understand for clinical and IT personnel. To address these problems, our aim is to incorporate specialized clinical knowledge into the PM discovery algorithm.</div></div><div><h3>Methods:</h3><div>We propose a declarative approach to interactive process discovery in the clinical domain. Concretely, we present a set of declarative techniques that allows clinicians to incorporate their knowledge in the process, based on the Declare formalism.</div></div><div><h3>Results:</h3><div>The results of this work encompass both the declarative interactive approach and its implementation in the I-PALIA PM discovery algorithm, as well as an application to a use case for the treatment of prostate cancer. This application demonstrates that the implemented techniques are useful in managing typical problems that arise when applying PM methods to the clinical domain.</div></div><div><h3>Conclusion:</h3><div>This work proposes a novel approach with techniques for interactive process discovery in the clinical domain. This approach not only allows the clinical expert to interactively incorporate specialized knowledge into the PM algorithm, but also serves to obtain process models that are more comprehensible and better resemble treatment procedures.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104862"},"PeriodicalIF":4.0,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144309987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating large language models for information extraction from gastroscopy and colonoscopy reports through multi-strategy prompting 评估通过多策略提示从胃镜和结肠镜报告中提取信息的大型语言模型。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-06-10 DOI: 10.1016/j.jbi.2025.104844
Zhengqiu Yu , Lexin Fang , Yueping Ding , Yan Shen , Lei Xu , Yaozheng Cai , Xiangrong Liu
{"title":"Evaluating large language models for information extraction from gastroscopy and colonoscopy reports through multi-strategy prompting","authors":"Zhengqiu Yu ,&nbsp;Lexin Fang ,&nbsp;Yueping Ding ,&nbsp;Yan Shen ,&nbsp;Lei Xu ,&nbsp;Yaozheng Cai ,&nbsp;Xiangrong Liu","doi":"10.1016/j.jbi.2025.104844","DOIUrl":"10.1016/j.jbi.2025.104844","url":null,"abstract":"<div><h3>Objective:</h3><div>To systematically evaluate large language models (LLMs) for automated information extraction from gastroscopy and colonoscopy reports through prompt engineering, addressing their ability to extract structured information, recognize complex patterns, and support diagnostic reasoning in clinical contexts.</div></div><div><h3>Methods:</h3><div>We developed an evaluation framework incorporating three hierarchical tasks: basic entity extraction, pattern recognition, and diagnostic assessment. The study utilized a dataset of 162 endoscopic reports with structured annotations from clinical experts. Various language models, including proprietary, emerging, and open-source alternatives, were evaluated under both zero-shot and few-shot learning paradigms. For each task, multiple prompting strategies were implemented, including direct prompting and five Chain-of-Thought (CoT) prompting variants.</div></div><div><h3>Results:</h3><div>Larger models with specialized architectures achieved better performance in entity extraction tasks but faced notable challenges in capturing spatial relationships and integrating clinical findings. The effectiveness of few-shot learning varied across models and tasks, with larger models showing more consistent improvement patterns.</div></div><div><h3>Conclusion:</h3><div>These findings provide important insights into the current capabilities and limitations of language models in specialized medical domains, contributing to the development of more effective clinical documentation analysis systems.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104844"},"PeriodicalIF":4.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144284443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tailoring task arithmetic to address bias in models trained on multi-institutional datasets 裁剪任务算法以解决在多机构数据集上训练的模型中的偏差。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-06-08 DOI: 10.1016/j.jbi.2025.104858
Xiruo Ding , Zhecheng Sheng , Brian Hur , Justin Tauscher , Dror Ben-Zeev , Meliha Yetişgen , Serguei Pakhomov , Trevor Cohen
{"title":"Tailoring task arithmetic to address bias in models trained on multi-institutional datasets","authors":"Xiruo Ding ,&nbsp;Zhecheng Sheng ,&nbsp;Brian Hur ,&nbsp;Justin Tauscher ,&nbsp;Dror Ben-Zeev ,&nbsp;Meliha Yetişgen ,&nbsp;Serguei Pakhomov ,&nbsp;Trevor Cohen","doi":"10.1016/j.jbi.2025.104858","DOIUrl":"10.1016/j.jbi.2025.104858","url":null,"abstract":"<div><h3>Objective:</h3><div>Multi-institutional datasets are widely used for machine learning from clinical data, to increase dataset size and improve generalization. However, deep learning models in particular may learn to recognize the source of a data element, leading to biased predictions. For example, deep learning models for image recognition trained on chest radiographs with COVID-19 positive and negative examples drawn from different data sources can respond to indicators of provenance (e.g., radiological annotations outside the lung area per institution-specific practices) rather than pathology, generalizing poorly beyond their training data. Bias of this sort, called <em>confounding by provenance</em>, is of concern in natural language processing (NLP) because provenance indicators (e.g., institution-specific section headers, or region-specific dialects) are pervasive in language data. Prior work on addressing such bias has focused on statistical methods, without providing a solution for deep learning models for NLP.</div></div><div><h3>Methods:</h3><div>Recent work in representation learning has shown that representing the weights of a trained deep network as <em>task vectors</em> allows for their arithmetic composition to govern model capabilities towards desired behaviors. In this work, we evaluate the extent to which reducing a model’s ability to distinguish between contributing sites with such task arithmetic can mitigate confounding by provenance. To do so, we propose two model-agnostic methods, Task Arithmetic for Provenance Effect Reduction (TAPER) and Dominance-Aligned Polarized Provenance Effect Reduction (DAPPER), extending the task vectors approach to a novel problem domain.</div></div><div><h3>Results:</h3><div>Evaluation on three datasets shows improved robustness to confounding by provenance for both RoBERTa and Llama-2 models with the task vector approach, with improved performance at the extremes of distribution shift.</div></div><div><h3>Conclusion:</h3><div>This work emphasizes the importance of adjusting for confounding by provenance, especially in extreme cases of the shift. In use of deep learning models, DAPPER and TAPER show efficiency in mitigating such bias. They provide a novel mitigation strategy for confounding by provenance, with broad applicability to address other sources of bias in composite clinical data sets. Source code is available within the DeconDTN toolkit: <span><span>https://github.com/LinguisticAnomalies/DeconDTN-toolkit</span><svg><path></path></svg></span></div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104858"},"PeriodicalIF":4.0,"publicationDate":"2025-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144266246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信