Navya Martin Kollapally , James Geller , Vipina Kuttichi Keloth , Zhe He , Julia Xu
{"title":"Ontology enrichment using a large language model: Applying lexical, semantic, and knowledge network-based similarity for concept placement","authors":"Navya Martin Kollapally , James Geller , Vipina Kuttichi Keloth , Zhe He , Julia Xu","doi":"10.1016/j.jbi.2025.104865","DOIUrl":"10.1016/j.jbi.2025.104865","url":null,"abstract":"<div><h3>Objective</h3><div>Ontologies are essential for representing the knowledge of a domain. To make ontologies useful, they must encompass a comprehensive domain view. To achieve ontology enrichment, there is a need to discover new concepts to be added, either because they were missed in the first place, or the state-of-the-art has advanced to develop new real-world concepts. Our goal is to develop an automatic enrichment pipeline using a seed ontology, a Large Language Model (LLM), and source of text. The pipeline is applied to the domain of Social Determinants of Health (SDoH), using PubMed as a source of concepts. In this work, the applicability and effectiveness of the enrichment pipeline is demonstrated by extending the SDoH Ontology called SOHOv1, however our methodology could be used in other domains as well.</div></div><div><h3>Methods</h3><div>We first retrieved PubMed abstracts of candidate articles with existing SOHOv1 concepts as search terms. Next, we used GPT-4-1201 to extract semantic triples from the abstracts. We identified concepts from these triples utilizing lexical, semantic, and knowledge network-based filtering. We also compared the granularity of semantic triples extracted with our method to the triples in the SemMedDB (Semantic MEDLINE Database). The results were evaluated by human experts and standard ontology tools for checking consistency and semantic correctness.</div></div><div><h3>Results</h3><div>We expanded SOHOv1, which contained 173 concepts and 585 axioms, including 207 logical axioms to SOHOv2, which contains 572 concepts, 1,542 axioms, including 725 logical axioms. Our methods identified more concepts than those extracted from SemMedDB for the same task. While we have shown the feasibility of our approach for an SDoH ontology, the methodology is generalizable to other ontologies with an existing seed ontology and text corpus.</div></div><div><h3>Conclusions</h3><div>The contributions of this work are: Extracting semantic triples from PubMed abstracts using GPT-4-1201 utilizing <em>prompt chaining</em>; showing the superiority of triples from GPT-4-1201 over triples from SemMedDB for SDoH; using lexical and semantic similarity search techniques with knowledge network-based search to identify the concepts to be added to the ontology; confirming the quality of the new concepts with human experts.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104865"},"PeriodicalIF":4.0,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144340171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carlos Fernández-Llatas , Begoña Martínez-Salvador , Mar Marcos
{"title":"A declarative approach for interactive process discovery in the clinical domain","authors":"Carlos Fernández-Llatas , Begoña Martínez-Salvador , Mar Marcos","doi":"10.1016/j.jbi.2025.104862","DOIUrl":"10.1016/j.jbi.2025.104862","url":null,"abstract":"<div><h3>Objective:</h3><div>Process Mining (PM) is an established discipline with increasing adoption in the clinical domain. In this context, PM seeks to infer clinical processes from healthcare data collected in the Electronic Health Record. However, the particularities of clinical practice cause that, in most cases, the processes obtained result in an intricate network that hardly corresponds to clinical algorithms and, thus, are difficult to understand for clinical and IT personnel. To address these problems, our aim is to incorporate specialized clinical knowledge into the PM discovery algorithm.</div></div><div><h3>Methods:</h3><div>We propose a declarative approach to interactive process discovery in the clinical domain. Concretely, we present a set of declarative techniques that allows clinicians to incorporate their knowledge in the process, based on the Declare formalism.</div></div><div><h3>Results:</h3><div>The results of this work encompass both the declarative interactive approach and its implementation in the I-PALIA PM discovery algorithm, as well as an application to a use case for the treatment of prostate cancer. This application demonstrates that the implemented techniques are useful in managing typical problems that arise when applying PM methods to the clinical domain.</div></div><div><h3>Conclusion:</h3><div>This work proposes a novel approach with techniques for interactive process discovery in the clinical domain. This approach not only allows the clinical expert to interactively incorporate specialized knowledge into the PM algorithm, but also serves to obtain process models that are more comprehensible and better resemble treatment procedures.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104862"},"PeriodicalIF":4.0,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144309987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhengqiu Yu , Lexin Fang , Yueping Ding , Yan Shen , Lei Xu , Yaozheng Cai , Xiangrong Liu
{"title":"Evaluating large language models for information extraction from gastroscopy and colonoscopy reports through multi-strategy prompting","authors":"Zhengqiu Yu , Lexin Fang , Yueping Ding , Yan Shen , Lei Xu , Yaozheng Cai , Xiangrong Liu","doi":"10.1016/j.jbi.2025.104844","DOIUrl":"10.1016/j.jbi.2025.104844","url":null,"abstract":"<div><h3>Objective:</h3><div>To systematically evaluate large language models (LLMs) for automated information extraction from gastroscopy and colonoscopy reports through prompt engineering, addressing their ability to extract structured information, recognize complex patterns, and support diagnostic reasoning in clinical contexts.</div></div><div><h3>Methods:</h3><div>We developed an evaluation framework incorporating three hierarchical tasks: basic entity extraction, pattern recognition, and diagnostic assessment. The study utilized a dataset of 162 endoscopic reports with structured annotations from clinical experts. Various language models, including proprietary, emerging, and open-source alternatives, were evaluated under both zero-shot and few-shot learning paradigms. For each task, multiple prompting strategies were implemented, including direct prompting and five Chain-of-Thought (CoT) prompting variants.</div></div><div><h3>Results:</h3><div>Larger models with specialized architectures achieved better performance in entity extraction tasks but faced notable challenges in capturing spatial relationships and integrating clinical findings. The effectiveness of few-shot learning varied across models and tasks, with larger models showing more consistent improvement patterns.</div></div><div><h3>Conclusion:</h3><div>These findings provide important insights into the current capabilities and limitations of language models in specialized medical domains, contributing to the development of more effective clinical documentation analysis systems.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104844"},"PeriodicalIF":4.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144284443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genlang Chen , Sixuan Sui , Jiajian Zhang , Xuan Liu , Ping Cai
{"title":"An attention-based framework for integrating WSI and genomic data in cancer survival prediction","authors":"Genlang Chen , Sixuan Sui , Jiajian Zhang , Xuan Liu , Ping Cai","doi":"10.1016/j.jbi.2025.104836","DOIUrl":"10.1016/j.jbi.2025.104836","url":null,"abstract":"<div><h3>Objective:</h3><div>Cancer survival prediction plays a vital role in enhancing medical decision-making and optimizing patient management. Accurate survival estimation enables healthcare providers to develop personalized treatment plans, improve treatment outcomes, and identify high-risk patients for timely intervention. However, existing methods often rely on single-modality data or suffer from excessive computational complexity, limiting their practical application and the full potential of multimodal integration.</div></div><div><h3>Methods:</h3><div>To address these challenges, we propose a novel multimodal survival prediction framework that integrates Whole Slide Image (WSI) and genomic data. The framework employs attention mechanisms to model intra-modal and inter-modal correlations, effectively capturing complex dependencies within and between modalities. Additionally, locality-sensitive hashing is applied to optimize the self-attention mechanism, significantly reducing computational costs while maintaining predictive performance, enabling the model to handle large-scale or high-resolution WSI datasets efficiently.</div></div><div><h3>Results:</h3><div>Extensive experiments on the TCGA-BLCA dataset validate the effectiveness of the proposed approach. The results demonstrate that integrating WSI and genomic data improves survival prediction accuracy compared to unimodal methods. The optimized self-attention mechanism further enhances model efficiency, allowing for practical implementation on large datasets.</div></div><div><h3>Conclusion:</h3><div>The proposed framework provides a robust and efficient solution for cancer survival prediction by leveraging multimodal data integration and optimized attention mechanisms. This study highlights the importance of multimodal learning in medical applications and offers a promising direction for future advancements in AI-driven clinical decision support systems.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104836"},"PeriodicalIF":4.0,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144248035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ningtao Liu , Shuiping Gou , Ruoxi Gao , Binxiao Su , Wenbo Liu , Claire K.S. Park , Shuwei Xing , Jing Yuan , Aaron Fenster
{"title":"GRU-TV: Time- and Velocity-aware Gated Recurrent Unit for patient representation","authors":"Ningtao Liu , Shuiping Gou , Ruoxi Gao , Binxiao Su , Wenbo Liu , Claire K.S. Park , Shuwei Xing , Jing Yuan , Aaron Fenster","doi":"10.1016/j.jbi.2025.104855","DOIUrl":"10.1016/j.jbi.2025.104855","url":null,"abstract":"<div><h3>Objective:</h3><div>The multivariate clinical temporal series (MCTS) extracted from electronic health records (EHRs) can characterize the dynamic physiological processes. Previous deep patient representation models were proposed to address imputation values and irregular sampling in MCTS. However, the change in physiological status, particularly instantaneous velocity, has not received adequate attention.</div></div><div><h3>Methods:</h3><div>To address this gap, we propose a Time- and Velocity-aware Gated Recurrent Unit (GRU-TV) model for patient representation learning. In the GRU-TV model, we apply the neural ordinary differential equation to describe the instantaneous velocity of the patient’s physiological status. This instantaneous velocity is embedded in the hidden state updating process of the basic GRU model for the awareness of uneven time intervals. Besides, the forward propagation of the GRU-TV model also incorporates this instantaneous velocity to enable the perception of non-uniform changes in the patient’s physiological status over time.</div></div><div><h3>Results:</h3><div>The performance of the GRU-TV model is evaluated on multiple clinical concerns across two real-world datasets. The average AUC for the sub-tasks on the complete, 70% sampled, and 50% sampled PhysioNet2012 datasets are 0.89, 0.84, and 0.83, respectively. The average AUC for the acute care phenotype classification on the complete, 20% sampled, and 10% sampled MIMIC-III datasets are 0.84, 0.82, and 0.80, respectively. The mean absolute deviation of the length-of-stay regression task is 1.84 days.</div></div><div><h3>Conclusion:</h3><div>The superior performance underscores the importance of instantaneous physiological changes in patient representation and clinical decision-making, particularly under challenging data conditions.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104855"},"PeriodicalIF":4.0,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144239851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuxi Liu , Zhenhao Zhang , Jiacong Mi , Shirui Pan , Tianlong Chen , Yi Guo , Xing He , Jiang Bian
{"title":"GatorCLR: Personalized predictions of patient outcomes on electronic health records using self-supervised contrastive graph representation","authors":"Yuxi Liu , Zhenhao Zhang , Jiacong Mi , Shirui Pan , Tianlong Chen , Yi Guo , Xing He , Jiang Bian","doi":"10.1016/j.jbi.2025.104851","DOIUrl":"10.1016/j.jbi.2025.104851","url":null,"abstract":"<div><h3>Objective:</h3><div>Recently, there has been growing interest in analyzing large amounts of Electronic Health Record (EHR) data. Patient outcome prediction is a major area of interest in EHR analysis that focuses on predicting the future health status of patients using structured data types, such as diagnoses, medications, and procedures collected from longitudinal EHR data. We investigate and design self-supervised learning (SSL) paradigms to learn high-quality representations from longitudinal EHR data, aiming to effectively capture longitudinal relationships and patterns for improved patient outcome predictions.</div></div><div><h3>Methods:</h3><div>We propose an end-to-end, novel, and robust model called GatorCLR that aligns with the contrastive SSL paradigm. GatorCLR incorporates graph analysis-based patient modeling into longitudinal EHR data, generating graph representations of nodes and edges representing patients, their relationships, and similarities. A two-layer augmentation technique is further incorporated in our GatorCLR that generates consistent, identity-preserving augmentations from graph representations.</div></div><div><h3>Results:</h3><div>We evaluate our approach using real-world EHR datasets. Experimental results indicate that our GatorCLR delivers meaningful and robust performance across multiple clinical tasks and datasets and provides transparency of the model decisions.</div></div><div><h3>Conclusion:</h3><div>The proposed approach presents a significant step toward developing a foundation model with longitudinal EHR data, capable of making informed predictions and adaptable to various downstream use cases and tasks. This study should, therefore, be of value to practitioners wishing to leverage longitudinal EHR data for predictive analytics.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104851"},"PeriodicalIF":4.0,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144212009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ziqi Guo , Jack Felag , Jordan C. Rozum , Rion Brattig Correia , Xuan Wang , Luis M. Rocha
{"title":"Focused digital cohort selection from social media using the metric backbone of biomedical knowledge graphs","authors":"Ziqi Guo , Jack Felag , Jordan C. Rozum , Rion Brattig Correia , Xuan Wang , Luis M. Rocha","doi":"10.1016/j.jbi.2025.104847","DOIUrl":"10.1016/j.jbi.2025.104847","url":null,"abstract":"<div><div>Social media data allows researchers to construct large <em>digital cohorts</em> — groups of users who post health-related content — to study the interplay between human behavior and medical treatment. Identifying the users most relevant to a specific health problem is, however, a challenge in that social media sites vary in the generality of their discourse. While X (formerly Twitter), Instagram, and Facebook cater to wide ranging topics, Reddit subgroups and dedicated patient advocacy forums trade in much more specific, biomedically-relevant discourse.</div><div>To filter relevant users on any social media, we have developed a general method and tested it on epilepsy discourse. We analyzed the text from posts by users who mention epilepsy drugs at least once in the general-purpose social media sites X and Instagram, the epilepsy-focused Reddit subgroup (r/Epilepsy), and the Epilepsy Foundation of America (EFA) forums. We used a curated medical terminology dictionary to generate a knowledge graph (KG) from each social media site, whereby nodes represent terms, and edge weights denote the strength of association between pairs of terms in the collected text.</div><div>Our method is based on computing the metric backbone of each KG, which yields the (sparsified) subgraph of edges that participate in shortest paths. By comparing the subset of users who contribute to the backbone to the subset who do not, we show that epilepsy-focused social media users contribute to the KG backbone in much higher proportion than do general-purpose social media users. Furthermore, using human annotation of Instagram posts, we demonstrate that users who do not contribute to the backbone are much more likely to use dictionary terms in a manner inconsistent with their biomedical meaning and are rightly excluded from the cohort of interest.</div><div>Our metric backbone approach, thus, has several benefits: it yields focused user cohorts who engage in discourse relevant to a targeted biomedical problem; unlike engagement-based approaches, it can retain low-engagement users who nonetheless contribute meaningful biomedical insights and filter out very vocal users who contribute no relevant content, it is parameter-free, algebraically principled, does not require classifiers or human-curation, and is simple to compute with the open-source code we provide.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104847"},"PeriodicalIF":4.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144215947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Shi , Anna Sun , Hongmei Nan , Yuedi Yang , Jing Xu , Michael T Eadon , Jing Su , Pengyue Zhang
{"title":"A trajectory-informed model for detecting drug-drug-host interaction from real-world data","authors":"Yi Shi , Anna Sun , Hongmei Nan , Yuedi Yang , Jing Xu , Michael T Eadon , Jing Su , Pengyue Zhang","doi":"10.1016/j.jbi.2025.104859","DOIUrl":"10.1016/j.jbi.2025.104859","url":null,"abstract":"<div><h3>Objective</h3><div>Adverse drug event (ADE) is a significant challenge to public health. Since data mining methods have been developed to identify signals of drug-drug interaction-induced (DDI-induced) or drug-host interaction-induced (DHI-induced) ADE from real-world data, we aim to develop a new method to detect adverse drug-drug interaction with a special awareness on patient characteristics.</div></div><div><h3>Methods</h3><div>We developed a trajectory-informed model (TIM) to identify signals of adverse DDI with a special awareness on patient characteristics (i.e., drug-drug-host interaction [DDHI]). We also proposed a study design based on an optimal selection of within-subject and between-subjects controls for detecting ADEs from real-world data. We analyzed a large-scale US administrative claims data and conducted a simulation study.</div></div><div><h3>Results</h3><div>In administrative claims data analysis, we developed optimally matched case-control datasets for potential ADEs including acute kidney injury and gastrointestinal bleeding. We identified that an optimal selection of controls had a higher AUC compared to traditional designs for ADE detection (AUCs: 0.79–0.80 vs. 0.56–0.76). We observed that TIM detected more signals than reference methods (odds ratios: 1.13–3.18, P < 0.01), and found that 36 % of all signals generated by TIM were DDHI signals. In a simulation study, we demonstrated that TIM had an empirical false discovery rate (FDR) less than the desired value of 0.05, as well as > 1.4-fold higher probabilities of detection of DDHI signals than reference methods.</div></div><div><h3>Conclusions</h3><div>TIM had a high probability to identify signals of adverse DDI and DDHI in a high-throughput ADE mining while controlling false positive rate. A significant portion of drug-drug combinations were associated with an increased risk of ADEs only in specific patient subpopulations. Optimal selection of within-subject and between-subjects controls could improve the performance of ADE data mining.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104859"},"PeriodicalIF":4.0,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144204446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SigPhi-Med: A lightweight vision-language assistant for biomedicine","authors":"Feizhong Zhou, Xingyue Liu, Qiao Zeng, Zhuhan Li, Hanguang Xiao","doi":"10.1016/j.jbi.2025.104849","DOIUrl":"10.1016/j.jbi.2025.104849","url":null,"abstract":"<div><h3>Background:</h3><div>Recent advancements in general multimodal large language models (MLLMs) have led to substantial improvements in the performance of biomedical MLLMs across diverse medical tasks, exhibiting significant transformative potential. However, the large number of parameters in MLLMs necessitates substantial computational resources during both training and inference stages, thereby limiting their feasibility in resource-constrained clinical settings. This study aims to develop a lightweight biomedical multimodal small language model (MSLM) to mitigate this limitation.</div></div><div><h3>Methods:</h3><div>We replaced the large language model (LLM) in MLLMs with the small language model (SLM), resulting in a significant reduction in the number of parameters. To ensure that the model maintains strong performance on biomedical tasks, we systematically analyzed the effects of key components of biomedical MSLMs, including the SLM, vision encoder, training strategy, and training data, on model performance. Based on these analyses, we implemented specific optimizations for the model.</div></div><div><h3>Results:</h3><div>Experiments demonstrate that the performance of biomedical MSLMs is significantly influenced by the parameter count of the SLM component, the pre-training strategy and resolution of the vision encoder component, and both the quality and quantity of the training data. Compared to several state-of-the-art models, including LLaVA-Med-v1.5 (7B), LLaVA-Med (13B) and Med-MoE (2.7B × 4), our optimized model, SigPhi-Med, with only 4.2B parameters, achieves significantly superior overall performance across the VQA-RAD, SLAKE, and Path-VQA medical visual question-answering (VQA) benchmarks.</div></div><div><h3>Conclusions:</h3><div>This study highlights the significant potential of biomedical MSLMs in biomedical applications, presenting a more cost-effective approach for deploying AI assistants in healthcare settings. Additionally, our analysis of MSLMs key components provides valuable insights for their development in other specialized domains. Our code is available at <span><span>https://github.com/NyKxo1/SigPhi-Med</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"167 ","pages":"Article 104849"},"PeriodicalIF":4.0,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144189961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sai Krishna Vallamchetla , Omar Abdelkader , Ali Elnaggar , Doaa Ramadan , Md Manjurul Islam Shourav , Irbaz B. Riaz , Michelle P. Lin
{"title":"Do it faster with PICOS: Generative AI-Assisted systematic review screening","authors":"Sai Krishna Vallamchetla , Omar Abdelkader , Ali Elnaggar , Doaa Ramadan , Md Manjurul Islam Shourav , Irbaz B. Riaz , Michelle P. Lin","doi":"10.1016/j.jbi.2025.104860","DOIUrl":"10.1016/j.jbi.2025.104860","url":null,"abstract":"<div><h3>Background</h3><div>Systematic reviews (SRs) require substantial time and human resources, especially during the screening phase. Large Language Models (LLMs) have shown the potential to expedite screening. However, their use in generating structured PICOS (Population, Intervention/Exposure, Comparison, Outcome, Study design) summaries from title and abstract to assist human reviewers during screening remains unexplored.</div></div><div><h3>Objective</h3><div>To assess the impact of open-source (Mistral-Nemo-Instruct-2407) LLM-generated structured PICOS summaries on the speed and accuracy of title and abstract screening.</div></div><div><h3>Methods</h3><div>Four neurology trainees were grouped into two pairs based on previous screening experience. Pair A (A1, A2) consisted of less experienced trainees (1–2 SR), while Pair B (B1, B2) consisted of more experienced trainees (≥3 SR). Reviewers A1 and B1 received titles, abstracts, and LLM-generated structured PICOS summaries for each article. Reviewers A2 and B2 received only titles and abstracts. All reviewers independently screened the same set of 1,003 articles using predefined eligibility criteria. Screening times were recorded, and performance metrics were calculated.</div></div><div><h3>Results</h3><div>PICOS-assisted reviewers screened significantly faster (A1: 116 min; B1: 90 min) than those without (A2: 463 min; B2: 370 min), with approximately 75% reduction in screening workload. Sensitivity was perfect for PICOS-assisted reviewers (100%), whereas it was lower for those without assistance (88.0% and 92.0%). Furthermore, PICOS-assisted reviewers demonstrated higher accuracy (99.9%), specificity (99.9), F1 scores (98.0%), and strong inter-rater reliability (Cohen’s Kappa of 99.8%). Less experienced reviewer with PICOS assistance(A1) outperformed experienced reviewer(B2) without assistance in both efficiency and sensitivity<strong>.</strong></div></div><div><h3>Conclusion</h3><div>LLM-generated PICOS summaries enhance the speed and accuracy of title and abstract screening by providing an additional layer of structured information. With PICOS assistance, less experienced reviewer surpassed their more experienced peers. Future research should explore the applicability of this novel method across diverse fields outside of neurology and its integration into fully automated systems.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104860"},"PeriodicalIF":4.0,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144187104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}