Chunlong Miao, Jingjing Luo, Yan Liang, Hong Liang, Yuhui Cen, Shijie Guo, Hongliu Yu
{"title":"Long-term care plan recommendation for older adults with disabilities: a bipartite graph transformer and self-supervised approach.","authors":"Chunlong Miao, Jingjing Luo, Yan Liang, Hong Liang, Yuhui Cen, Shijie Guo, Hongliu Yu","doi":"10.1093/jamia/ocae327","DOIUrl":"10.1093/jamia/ocae327","url":null,"abstract":"<p><strong>Background: </strong>With the global population aging and advancements in the medical system, long-term care in healthcare institutions and home settings has become essential for older adults with disabilities. However, the diverse and scattered care requirements of these individuals make developing effective long-term care plans heavily reliant on professional nursing staff, and even experienced caregivers may make mistakes or face confusion during the care plan development process. Consequently, there is a rigid demand for intelligent systems that can recommend comprehensive long-term care plans for older adults with disabilities who have stable clinical conditions.</p><p><strong>Objective: </strong>This study aims to utilize deep learning methods to recommend comprehensive care plans for the older adults with disabilities.</p><p><strong>Methods: </strong>We model the care data of older adults with disabilities using a bipartite graph. Additionally, we employ a prediction-based graph self-supervised learning (SSL) method to mine deep representations of graph nodes. Furthermore, we propose a novel graph Transformer architecture that incorporates eigenvector centrality to augment node features and uses graph structural information as references for the self-attention mechanism. Ultimately, we present the Bipartite Graph Transformer (BiT) model to provide personalized long-term care plan recommendation.</p><p><strong>Results: </strong>We constructed a bipartite graph comprising of 1917 nodes and 195 240 edges derived from real-world care data. The proposed model demonstrates outstanding performance, achieving an overall F1 score of 0.905 for care plan recommendations. Each care service item reached an average F1 score of 0.897, indicating that the BiT model is capable of accurately selecting services and effectively balancing the trade-off between incorrect and missed selections.</p><p><strong>Discussion: </strong>The BiT model proposed in this paper demonstrates strong potential for improving long-term care plan recommendations by leveraging bipartite graph modeling and graph SSL. This approach addresses the challenges of manual care planning, such as inefficiency, bias, and errors, by offering personalized and data-driven recommendations. While the model excels in common care items, its performance on rare or complex services could be enhanced with further refinement. These findings highlight the model's ability to provide scalable, AI-driven solutions to optimize care planning, though future research should explore its applicability across diverse healthcare settings and service types.</p><p><strong>Conclusions: </strong>Compared to previous research, the novel model proposed in this article effectively learns latent topology in bipartite graphs and achieves superior recommendation performance. Our study demonstrates the applicability of SSL and graph transformers in recommending long-term care plans for older adults with disabilitie","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"689-701"},"PeriodicalIF":4.7,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12079649/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143069022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brian J McInnis, Ramona Pindus, Daniah H Kareem, Julie Cakici, Daniela G Vital, Eric Hekler, Camille Nebeker
{"title":"Using dataflow diagrams to support research informed consent data management communications: participant perspectives.","authors":"Brian J McInnis, Ramona Pindus, Daniah H Kareem, Julie Cakici, Daniela G Vital, Eric Hekler, Camille Nebeker","doi":"10.1093/jamia/ocaf004","DOIUrl":"10.1093/jamia/ocaf004","url":null,"abstract":"<p><strong>Objectives: </strong>Digital health research involves collecting vast amounts of personal health data, making data management practices complex and challenging to convey during informed consent.</p><p><strong>Materials and methods: </strong>We conducted eight semi-structured focus groups to explore whether dataflow diagrams (DFD) can complement informed consent and improve participants' understanding of data management and associated risks (N = 34 participants).</p><p><strong>Results: </strong>Our analysis found that DFDs could supplement text-based information about data management and sharing practices, such as by helping raise new questions that prompt conversation between prospective participants and members of a research team. Participants in the study emphasized the need for clear, simple, and accessible diagrams that are participant centered. Third-party access to data and sharing of sensitive health data were identified as high-risk areas requiring thorough explanation. Participants generally agreed that the design process should be led by the research team, but it should incorporate many diverse perspectives to ensure the diagram was meaningful to potential participants who are likely unfamiliar with data management. Nearly all participants rejected the idea that artificial intelligence could identify risks during the design process, but most were comfortable with it being used as a tool to format and simplify the diagram. In short, DFDs may complement standard text-based informed consent documents, but they are not a replacement.</p><p><strong>Discussion: </strong>Prospective research participants value diverse ways of learning about study risks and benefits. Our study highlights the value of incorporating information visualizations, such as DFDs, into the informed consent procedures to participate in research.</p><p><strong>Conclusion: </strong>Future research should explore other ways of visualizing consent information in ways that help people to overcome digital and data literacy barriers to participating in research. However, creating a DFD requires significant time and effort from research teams. To alleviate these costs, research sponsors can support the creation of shared infrastructure, communities of practice, and incentivize researchers to develop better consent procedures.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"712-723"},"PeriodicalIF":4.7,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12005621/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143191205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Suzanne V Blackley, Ying-Chih Lo, Sheril Varghese, Frank Y Chang, Oliver D James, Diane L Seger, Kimberly G Blumenthal, Foster R Goss, Li Zhou
{"title":"Building an allergy reconciliation module to eliminate allergy discrepancies in electronic health records.","authors":"Suzanne V Blackley, Ying-Chih Lo, Sheril Varghese, Frank Y Chang, Oliver D James, Diane L Seger, Kimberly G Blumenthal, Foster R Goss, Li Zhou","doi":"10.1093/jamia/ocaf022","DOIUrl":"10.1093/jamia/ocaf022","url":null,"abstract":"<p><strong>Objective: </strong>Accurate, complete allergy histories are critical for decision-making and medication prescription. However, allergy information is often spread across the electronic health record (EHR); thus, allergy lists are often inaccurate or incomplete. Discrepant allergy information can lead to suboptimal or unsafe clinical care and contribute to alert fatigue. We developed an allergy reconciliation module within Mass General Brigham (MGB)'s EHR to support accurate and intuitive reconciliation of discrepancies in the allergy list, thereby enhancing patient safety.</p><p><strong>Materials and methods: </strong>We combined data-driven methods and knowledge from domain experts to develop 5 mechanisms to compare allergy information across the EHR and designed a user interface to display discrepancies and suggested reconciliation actions, with links to relevant data sources. Qualitative and quantitative analyses were conducted to assess the module's performance and measure user acceptance.</p><p><strong>Results: </strong>We implemented and tested the proposed allergy reconciliation mechanisms and module. A comprehensive integration workflow was developed for the module, which was piloted among 111 primary care physicians at MGB. F1 scores of the reconciliation mechanisms range from 0.86 to 1.0. Qualitative analysis showed majority positive feedback from pilot users.</p><p><strong>Discussion: </strong>Our allergy reconciliation module achieved high performance, and physicians who used it largely accepted its recommendations. However, 56% of the pilot group ultimately did not use the module. User engagement and education are likely needed to increase adoption.</p><p><strong>Conclusion: </strong>We built a module to automatically identify discrepancies within patients' allergy records and remind providers to reconcile and update the allergy list. Its high accuracy shows promise for enhancing patient safety and utility of drug allergy alerts.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"648-655"},"PeriodicalIF":4.7,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12005630/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143450818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C Jason Liang, Chongliang Luo, Henry R Kranzler, Jiang Bian, Yong Chen
{"title":"Communication-efficient federated learning of temporal effects on opioid use disorder with data from distributed research networks.","authors":"C Jason Liang, Chongliang Luo, Henry R Kranzler, Jiang Bian, Yong Chen","doi":"10.1093/jamia/ocae313","DOIUrl":"10.1093/jamia/ocae313","url":null,"abstract":"<p><strong>Objective: </strong>To develop a distributed algorithm to fit multi-center Cox regression models with time-varying coefficients to facilitate privacy-preserving data integration across multiple health systems.</p><p><strong>Materials and methods: </strong>The Cox model with time-varying coefficients relaxes the proportional hazards assumption of the usual Cox model and is particularly useful to model time-to-event outcomes. We proposed a One-shot Distributed Algorithm to fit multi-center Cox regression models with Time varying coefficients (ODACT). This algorithm constructed a surrogate likelihood function to approximate the Cox partial likelihood function, using patient-level data from a lead site and aggregated data from other sites. The performance of ODACT was demonstrated by simulation and a real-world study of opioid use disorder (OUD) using decentralized data from a large clinical research network across 5 sites with 69 163 subjects.</p><p><strong>Results: </strong>The ODACT method precisely estimated the time-varying effects over time. In the simulation study, ODACT always achieved estimation close to that of the pooled analysis, while the meta-estimator showed considerable amount of bias. In the OUD study, the bias of the estimated hazard ratios by ODACT are smaller than those of the meta-estimator for all 7 risk factors at almost all of the time points from 0 to 2.5 years. The greatest bias of the meta-estimator was for the effects of age ≥65 years, and smoking.</p><p><strong>Conclusion: </strong>ODACT is a privacy-preserving and communication-efficient method for analyzing multi-center time-to-event data which allows the covariates' effects to be time-varying. ODACT provides estimates close to the pooled estimator and substantially outperforms the meta-analysis estimator.</p><p><strong>Discussion: </strong>The proposed ODACT is a privacy-preserving distributed algorithm for fitting Cox models with time-varying coefficients. The limitations of ODACT include that privacy-preserving via aggregate data does rely on relatively large number of data at each individual site, and rigorous quantification of the risk of privacy leaks requires further investigation.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"656-664"},"PeriodicalIF":4.7,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12005629/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Swaminathan Kandaswamy, Julia K W Yarahuan, Elizabeth A Dobler, Matthew J Molloy, Lindsey A Knake, Sean M Hernandez, Anne A Fallon, Lauren M Hess, Allison B McCoy, Regine M Fortunov, Eric S Kirkendall, Naveen Muthu, Evan W Orenstein, Adam C Dziorny, Juan D Chaparro
{"title":"Alert design in the real world: a cross-sectional analysis of interruptive alerting at 9 academic pediatric health systems.","authors":"Swaminathan Kandaswamy, Julia K W Yarahuan, Elizabeth A Dobler, Matthew J Molloy, Lindsey A Knake, Sean M Hernandez, Anne A Fallon, Lauren M Hess, Allison B McCoy, Regine M Fortunov, Eric S Kirkendall, Naveen Muthu, Evan W Orenstein, Adam C Dziorny, Juan D Chaparro","doi":"10.1093/jamia/ocaf013","DOIUrl":"10.1093/jamia/ocaf013","url":null,"abstract":"<p><strong>Objective: </strong>To assess the prevalence of recommended design elements in implemented electronic health record (EHR) interruptive alerts across pediatric care settings.</p><p><strong>Materials and methods: </strong>We conducted a 3-phase mixed-methods cross-sectional study. Phase 1 involved developing a codebook for alert content classification. Phase 2 identified the most frequently interruptive alerts at participating sites. Phase 3 applied the codebook to classify alerts. Inter-rater reliability (IRR) for the codebook and descriptive statistics for alert design contents were reported.</p><p><strong>Results: </strong>We classified alert content on design elements such as the rationale for the alert's appearance, the hazard of ignoring it, directive versus informational content, administrative purpose, and whether it aligned with one of the Institute of Medicine's (IOM) domains of healthcare quality. Most design elements achieved an IRR above 0.7, with the exceptions for identifying directive content outside of an alert (IRR 0.58) and whether an alert was for administrative purposes only (IRR 0.36). IRR was poor for all IOM domains except equity. Institutions varied widely in the number of unique alerts and their designs. 78% of alerts stated their purpose, over half were directive, and 13% were informational. Only 2%-20% of alerts explained the consequences of inaction.</p><p><strong>Discussion: </strong>This study raises important questions about the optimal balance of alert functions and desirable features of alert representation.</p><p><strong>Conclusion: </strong>Our study provides the first multi-center analysis of EHR alert design elements in pediatric care settings, revealing substantial variation in content and design. These findings underline the need for future research to experimentally explore EHR alert design best practices to improve efficiency and effectiveness.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"682-688"},"PeriodicalIF":4.7,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12005624/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143191202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dean F Sittig, Trisha Flanagan, Patricia Sengstack, Rosann T Cholankeril, Sara Ehsan, Amanda Heidemann, Daniel R Murphy, Hojjat Salmasian, Jason S Adelman, Hardeep Singh
{"title":"Revisions to the Safety Assurance Factors for Electronic Health Record Resilience (SAFER) Guides to update national recommendations for safe use of electronic health records.","authors":"Dean F Sittig, Trisha Flanagan, Patricia Sengstack, Rosann T Cholankeril, Sara Ehsan, Amanda Heidemann, Daniel R Murphy, Hojjat Salmasian, Jason S Adelman, Hardeep Singh","doi":"10.1093/jamia/ocaf018","DOIUrl":"https://doi.org/10.1093/jamia/ocaf018","url":null,"abstract":"<p><p>The Safety Assurance Factors for Electronic Health Record (EHR) Resilience (SAFER) Guides provide recommendations to healthcare organizations for conducting proactive self-assessments of the safety and effectiveness of their EHR implementation and use. Originally released in 2014, they were last updated in 2016. In 2022, the Centers for Medicare and Medicaid Services required their annual attestation by US hospitals.</p><p><strong>Objectives: </strong>This case study describes how SAFER Guide recommendations were updated to align with current evidence and clinical practice.</p><p><strong>Materials and methods: </strong>Over nine months, a multidisciplinary team updated SAFER Guides through literature reviews, iterative feedback, and online meetings.</p><p><strong>Results: </strong>We reduced the number of recommended practices across all Guides by 40% and consolidated 9 Guides into 8 to maximize ease of use, feasibility, and utility. We provide a 4-level evidence grading hierarchy for each recommendation and a new 5-point rating scale to self-assess implementation status of the recommendation. We included 429 citations of which 289 (67%) were published since the 2016 revision.</p><p><strong>Discussion: </strong>SAFER Guides were revised to offer EHR best practices, adaptable to unique organizational needs, with interactive content available at: https://www.healthit.gov/topic/safety/safer-guides.</p><p><strong>Conclusion: </strong>Revisions ensure that the 2025 SAFER Guides represent the best available current evidence for EHR developers and healthcare organizations.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":"32 4","pages":"755-760"},"PeriodicalIF":4.7,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12005625/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143990402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advancing the application and evaluation of large language models in health and biomedicine.","authors":"Suzanne Bakken","doi":"10.1093/jamia/ocaf043","DOIUrl":"https://doi.org/10.1093/jamia/ocaf043","url":null,"abstract":"","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":"32 4","pages":"603-604"},"PeriodicalIF":4.7,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12005626/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144013574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Muhammad Ali Khan, Umair Ayub, Syed Arsalan Ahmed Naqvi, Kaneez Zahra Rubab Khakwani, Zaryab Bin Riaz Sipra, Ammad Raina, Sihan Zhou, Huan He, Amir Saeidi, Bashar Hasan, Robert Bryan Rumble, Danielle S Bitterman, Jeremy L Warner, Jia Zou, Amye J Tevaarwerk, Konstantinos Leventakos, Kenneth L Kehl, Jeanne M Palmer, Mohammad Hassan Murad, Chitta Baral, Irbaz Bin Riaz
{"title":"Collaborative large language models for automated data extraction in living systematic reviews.","authors":"Muhammad Ali Khan, Umair Ayub, Syed Arsalan Ahmed Naqvi, Kaneez Zahra Rubab Khakwani, Zaryab Bin Riaz Sipra, Ammad Raina, Sihan Zhou, Huan He, Amir Saeidi, Bashar Hasan, Robert Bryan Rumble, Danielle S Bitterman, Jeremy L Warner, Jia Zou, Amye J Tevaarwerk, Konstantinos Leventakos, Kenneth L Kehl, Jeanne M Palmer, Mohammad Hassan Murad, Chitta Baral, Irbaz Bin Riaz","doi":"10.1093/jamia/ocae325","DOIUrl":"10.1093/jamia/ocae325","url":null,"abstract":"<p><strong>Objective: </strong>Data extraction from the published literature is the most laborious step in conducting living systematic reviews (LSRs). We aim to build a generalizable, automated data extraction workflow leveraging large language models (LLMs) that mimics the real-world 2-reviewer process.</p><p><strong>Materials and methods: </strong>A dataset of 10 trials (22 publications) from a published LSR was used, focusing on 23 variables related to trial, population, and outcomes data. The dataset was split into prompt development (n = 5) and held-out test sets (n = 17). GPT-4-turbo and Claude-3-Opus were used for data extraction. Responses from the 2 LLMs were considered concordant if they were the same for a given variable. The discordant responses from each LLM were provided to the other LLM for cross-critique. Accuracy, ie, the total number of correct responses divided by the total number of responses, was computed to assess performance.</p><p><strong>Results: </strong>In the prompt development set, 110 (96%) responses were concordant, achieving an accuracy of 0.99 against the gold standard. In the test set, 342 (87%) responses were concordant. The accuracy of the concordant responses was 0.94. The accuracy of the discordant responses was 0.41 for GPT-4-turbo and 0.50 for Claude-3-Opus. Of the 49 discordant responses, 25 (51%) became concordant after cross-critique, increasing accuracy to 0.76.</p><p><strong>Discussion: </strong>Concordant responses by the LLMs are likely to be accurate. In instances of discordant responses, cross-critique can further increase the accuracy.</p><p><strong>Conclusion: </strong>Large language models, when simulated in a collaborative, 2-reviewer workflow, can extract data with reasonable performance, enabling truly \"living\" systematic reviews.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"638-647"},"PeriodicalIF":4.7,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12005628/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143015012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating robustly standardized explainable anomaly detection of implausible variables in cancer data.","authors":"Philipp Röchner, Franz Rothlauf","doi":"10.1093/jamia/ocaf011","DOIUrl":"10.1093/jamia/ocaf011","url":null,"abstract":"<p><strong>Objectives: </strong>Explanations help to understand why anomaly detection algorithms identify data as anomalous. This study evaluates whether robustly standardized explanation scores correctly identify the implausible variables that make cancer data anomalous.</p><p><strong>Materials and methods: </strong>The dataset analyzed consists of 18 587 truncated real-world cancer registry records containing 8 categorical variables describing patients diagnosed with bladder and lung tumors. We identified 800 anomalous records using an autoencoder's per-record reconstruction error, which is a common neural network-based anomaly detection approach. For each variable of a record, we determined a robust explanation score, which indicates how anomalous the variable is. A variable's robust explanation score is the autoencoder's per-variable reconstruction error measured by cross-entropy and robustly standardized across records; that is, large reconstruction errors have a small effect on standardization. To evaluate the explanation scores, medical coders identified the implausible variables of the anomalous records. We then compare the explanation scores to the medical coders' validation in a classification and ranking setting. As baselines, we identified anomalous variables using the raw autoencoder's per-variable reconstruction error, the non-robustly standardized per-variable reconstruction error, the empirical frequency of implausible variables according to the medical coders' validation, and random selection or ranking of variables.</p><p><strong>Results: </strong>When we sort the variables by their robust explanation scores, on average, the 2.37 highest-ranked variables contain all implausible variables. For the baselines, on average, the 2.84, 2.98, 3.27, and 4.91 highest-ranked variables contain all the variables that made a record implausible.</p><p><strong>Discussion: </strong>We found that explanations based on robust explanation scores were better than or as good as the baseline explanations examined in the classification and ranking settings. Due to the international standardization of cancer data coding, we expect our results to generalize to other cancer types and registries. As we anticipate different magnitudes of per-variable autoencoder reconstruction errors in data from other medical registries and domains, these may also benefit from robustly standardizing the reconstruction errors per variable. Future work could explore methods to identify subsets of anomalous variables, addressing whether individual variables or their combinations contribute to anomalies. This direction aims to improve the interpretability and utility of anomaly detection systems.</p><p><strong>Conclusions: </strong>Robust explanation scores can improve explanations for identifying implausible variables in cancer data.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"724-735"},"PeriodicalIF":4.7,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12005620/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143054062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving large language model applications in biomedicine with retrieval-augmented generation: a systematic review, meta-analysis, and clinical development guidelines.","authors":"Siru Liu, Allison B McCoy, Adam Wright","doi":"10.1093/jamia/ocaf008","DOIUrl":"10.1093/jamia/ocaf008","url":null,"abstract":"<p><strong>Objective: </strong>The objectives of this study are to synthesize findings from recent research of retrieval-augmented generation (RAG) and large language models (LLMs) in biomedicine and provide clinical development guidelines to improve effectiveness.</p><p><strong>Materials and methods: </strong>We conducted a systematic literature review and a meta-analysis. The report was created in adherence to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses 2020 analysis. Searches were performed in 3 databases (PubMed, Embase, PsycINFO) using terms related to \"retrieval augmented generation\" and \"large language model,\" for articles published in 2023 and 2024. We selected studies that compared baseline LLM performance with RAG performance. We developed a random-effect meta-analysis model, using odds ratio as the effect size.</p><p><strong>Results: </strong>Among 335 studies, 20 were included in this literature review. The pooled effect size was 1.35, with a 95% confidence interval of 1.19-1.53, indicating a statistically significant effect (P = .001). We reported clinical tasks, baseline LLMs, retrieval sources and strategies, as well as evaluation methods.</p><p><strong>Discussion: </strong>Building on our literature review, we developed Guidelines for Unified Implementation and Development of Enhanced LLM Applications with RAG in Clinical Settings to inform clinical applications using RAG.</p><p><strong>Conclusion: </strong>Overall, RAG implementation showed a 1.35 odds ratio increase in performance compared to baseline LLMs. Future research should focus on (1) system-level enhancement: the combination of RAG and agent, (2) knowledge-level enhancement: deep integration of knowledge into LLM, and (3) integration-level enhancement: integrating RAG systems within electronic health records.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"605-615"},"PeriodicalIF":4.7,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12005634/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142985309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}