Magdalena T Weber, Richard Noll, Alexandra Marchl, Carlo Facchinello, Achim Grünewaldt, Christian Hügel, Khader Musleh, Thomas O F Wagner, Holger Storf, Jannik Schaaf
{"title":"MedBot vs RealDoc: efficacy of large language modeling in physician-patient communication for rare diseases.","authors":"Magdalena T Weber, Richard Noll, Alexandra Marchl, Carlo Facchinello, Achim Grünewaldt, Christian Hügel, Khader Musleh, Thomas O F Wagner, Holger Storf, Jannik Schaaf","doi":"10.1093/jamia/ocaf034","DOIUrl":"10.1093/jamia/ocaf034","url":null,"abstract":"<p><strong>Objectives: </strong>This study assesses the abilities of 2 large language models (LLMs), GPT-4 and BioMistral 7B, in responding to patient queries, particularly concerning rare diseases, and compares their performance with that of physicians.</p><p><strong>Materials and methods: </strong>A total of 103 patient queries and corresponding physician answers were extracted from EXABO, a question-answering forum dedicated to rare respiratory diseases. The responses provided by physicians and generated by LLMs were ranked on a Likert scale by a panel of 4 experts based on 4 key quality criteria for health communication: correctness, comprehensibility, relevance, and empathy.</p><p><strong>Results: </strong>The performance of generative pretrained transformer 4 (GPT-4) was significantly better than the performance of the physicians and BioMistral 7B. While the overall ranking considers GPT-4's responses to be mostly correct, comprehensive, relevant, and emphatic, the responses provided by BioMistral 7B were only partially correct and empathetic. The responses given by physicians rank in between. The experts concur that an LLM could lighten the load for physicians, rigorous validation is considered essential to guarantee dependability and efficacy.</p><p><strong>Discussion: </strong>Open-source models such as BioMistral 7B offer the advantage of privacy by running locally in health-care settings. GPT-4, on the other hand, demonstrates proficiency in communication and knowledge depth. However, challenges persist, including the management of response variability, the balancing of comprehensibility with medical accuracy, and the assurance of consistent performance across different languages.</p><p><strong>Conclusion: </strong>The performance of GPT-4 underscores the potential of LLMs in facilitating physician-patient communication. However, it is imperative that these systems are handled with care, as erroneous responses have the potential to cause harm without the requisite validation procedures.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"775-783"},"PeriodicalIF":4.7,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012358/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143505806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sahil Sandhu, Michael Liu, Laura M Gottlieb, A Jay Holmgren, Lisa S Rotenstein, Matthew S Pantell
{"title":"Interoperability of health-related social needs data at US hospitals.","authors":"Sahil Sandhu, Michael Liu, Laura M Gottlieb, A Jay Holmgren, Lisa S Rotenstein, Matthew S Pantell","doi":"10.1093/jamia/ocaf049","DOIUrl":"10.1093/jamia/ocaf049","url":null,"abstract":"<p><strong>Objective: </strong>To measure hospital engagement in interoperable exchange of health-related social needs (HRSN) data.</p><p><strong>Materials and methods: </strong>This study combined national data from the 2022 American Hospital Association (AHA) Annual Survey, AHA IT Supplement, and the Centers for Medicare and Medicaid Services Impact File. Multivariable logistic regression was used to identify hospital characteristics associated with receiving HRSN data from external organizations.</p><p><strong>Results: </strong>Of 2502 hospitals, 61.4% reported electronically receiving HRSN data from external sources, most commonly from health information exchange organizations. Hospitals participating in accountable care organizations or patient-centered medical homes and hospitals using Epic or Cerner electronic health records (EHRs) were more likely to receive external HRSN data. In contrast, for-profit hospitals and public hospitals were less likely to participate in HRSN data exchange.</p><p><strong>Discussion: </strong>Hospital ownership, participation in value-based care models, and EHR vendor capabilities are important drivers in advancing HRSN data exchange.</p><p><strong>Conclusion: </strong>Additional policy and technological support may be needed to enhance HRSN data interoperability.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"914-919"},"PeriodicalIF":4.7,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012371/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143674776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joshua M Biro, Jessica L Handley, James Mickler, Sahithi Reddy, Varsha Kottamasu, Raj M Ratwani, Nathan K Cobb
{"title":"The value of simulation testing for the evaluation of ambient digital scribes: a case report.","authors":"Joshua M Biro, Jessica L Handley, James Mickler, Sahithi Reddy, Varsha Kottamasu, Raj M Ratwani, Nathan K Cobb","doi":"10.1093/jamia/ocaf052","DOIUrl":"10.1093/jamia/ocaf052","url":null,"abstract":"<p><strong>Objectives: </strong>The objective of this work is to demonstrate the value of simulation testing for rapidly evaluating artificial intelligence (AI) products.</p><p><strong>Materials and methods: </strong>Researcher-physician teams simulated the use of 2 Ambient Digital Scribe (ADS) products by reading scripts of outpatient encounters while using both products, yielding a total of 44 draft notes. Time to edit, perceived amount of effort and editing, and errors in the AI-generated draft notes were analyzed.</p><p><strong>Results: </strong>Ambient Digital Scribe Product A draft notes took significantly longer to edit, had fewer omissions, and more additions and irrelevant or misplaced text errors than ADS Product B. Ambient Digital Scribe Product A was rated as performing better for most encounters.</p><p><strong>Discussion: </strong>Artificial intelligence-enabled products are being rapidly developed and implemented into practice, outpacing safety concerns. Simulation testing can efficiently identify safety issues.</p><p><strong>Conclusion: </strong>Simulation testing is a crucial first step to take when evaluating AI-enabled technologies.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"928-931"},"PeriodicalIF":4.7,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012335/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143674778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The health data utility and the resurgence of health information exchanges as a national resource.","authors":"Anjum Khurshid, Indra Neil Sarkar","doi":"10.1093/jamia/ocaf032","DOIUrl":"10.1093/jamia/ocaf032","url":null,"abstract":"<p><strong>Objectives: </strong>(1) Describe the evolution of Health Information Exchanges (HIEs) into Health Data Utilities (HDUs); (2) Provide motivation for HDUs as a public strategic investment target.</p><p><strong>Materials and methods: </strong>We examine trends in developing HIEs into HDUs and compare their criticality to that of the national highway system as an investment in the public good.</p><p><strong>Results: </strong>We propose that investment in HDUs is essential for our nation's healthcare data ecosystem. This investment will address the increased need for healthcare delivery and public health data.</p><p><strong>Discussion: </strong>HDUs can meet the current and future needs of healthcare delivery and public health surveillance. Their structure and capabilities will underpin their success to support data-driven decision-making.</p><p><strong>Conclusion: </strong>Transforming HIEs into HDUs is essential to realizing the vision of a distributed and connected healthcare data system. Public funding is critical for this model's success, similar to the continued investment in the national highway system.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"964-967"},"PeriodicalIF":4.7,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012352/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143558515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Katherine E Brown, Chao Yan, Zhuohang Li, Xinmeng Zhang, Benjamin X Collins, You Chen, Ellen Wright Clayton, Murat Kantarcioglu, Yevgeniy Vorobeychik, Bradley A Malin
{"title":"Large language models are less effective at clinical prediction tasks than locally trained machine learning models.","authors":"Katherine E Brown, Chao Yan, Zhuohang Li, Xinmeng Zhang, Benjamin X Collins, You Chen, Ellen Wright Clayton, Murat Kantarcioglu, Yevgeniy Vorobeychik, Bradley A Malin","doi":"10.1093/jamia/ocaf038","DOIUrl":"10.1093/jamia/ocaf038","url":null,"abstract":"<p><strong>Objectives: </strong>To determine the extent to which current large language models (LLMs) can serve as substitutes for traditional machine learning (ML) as clinical predictors using data from electronic health records (EHRs), we investigated various factors that can impact their adoption, including overall performance, calibration, fairness, and resilience to privacy protections that reduce data fidelity.</p><p><strong>Materials and methods: </strong>We evaluated GPT-3.5, GPT-4, and traditional ML (as gradient-boosting trees) on clinical prediction tasks in EHR data from Vanderbilt University Medical Center (VUMC) and MIMIC IV. We measured predictive performance with area under the receiver operating characteristic (AUROC) and model calibration using Brier Score. To evaluate the impact of data privacy protections, we assessed AUROC when demographic variables are generalized. We evaluated algorithmic fairness using equalized odds and statistical parity across race, sex, and age of patients. We also considered the impact of using in-context learning by incorporating labeled examples within the prompt.</p><p><strong>Results: </strong>Traditional ML [AUROC: 0.847, 0.894 (VUMC, MIMIC)] substantially outperformed GPT-3.5 (AUROC: 0.537, 0.517) and GPT-4 (AUROC: 0.629, 0.602) (with and without in-context learning) in predictive performance and output probability calibration [Brier Score (ML vs GPT-3.5 vs GPT-4): 0.134 vs 0.384 vs 0.251, 0.042 vs 0.06 vs 0.219)].</p><p><strong>Discussion: </strong>Traditional ML is more robust than GPT-3.5 and GPT-4 in generalizing demographic information to protect privacy. GPT-4 is the fairest model according to our selected metrics but at the cost of poor model performance.</p><p><strong>Conclusion: </strong>These findings suggest that non-fine-tuned LLMs are less effective and robust than locally trained ML for clinical prediction tasks, but they are improving across releases.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"811-822"},"PeriodicalIF":4.7,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012369/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143582390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Serena Jinchen Xie, Carolin Spice, Patrick Wedgeworth, Raina Langevin, Kevin Lybarger, Angad Preet Singh, Brian R Wood, Jared W Klein, Gary Hsieh, Herbert C Duber, Andrea L Hartzler
{"title":"Patient and clinician acceptability of automated extraction of social drivers of health from clinical notes in primary care.","authors":"Serena Jinchen Xie, Carolin Spice, Patrick Wedgeworth, Raina Langevin, Kevin Lybarger, Angad Preet Singh, Brian R Wood, Jared W Klein, Gary Hsieh, Herbert C Duber, Andrea L Hartzler","doi":"10.1093/jamia/ocaf046","DOIUrl":"10.1093/jamia/ocaf046","url":null,"abstract":"<p><strong>Objective: </strong>Artificial Intelligence (AI)-based approaches for extracting Social Drivers of Health (SDoH) from clinical notes offer healthcare systems an efficient way to identify patients' social needs, yet we know little about the acceptability of this approach to patients and clinicians. We investigated patient and clinician acceptability through interviews.</p><p><strong>Materials and methods: </strong>We interviewed primary care patients experiencing social needs (n = 19) and clinicians (n = 14) about their acceptability of \"SDoH autosuggest,\" an AI-based approach for extracting SDoH from clinical notes. We presented storyboards depicting the approach and asked participants to rate their acceptability and discuss their rationale.</p><p><strong>Results: </strong>Participants rated SDoH autosuggest moderately acceptable (mean = 3.9/5 patients; mean = 3.6/5 clinicians). Patients' ratings varied across domains, with substance use rated most and employment rated least acceptable. Both groups raised concern about information integrity, actionability, impact on clinical interactions and relationships, and privacy. In addition, patients raised concern about transparency, autonomy, and potential harm, whereas clinicians raised concern about usability.</p><p><strong>Discussion: </strong>Despite reporting moderate acceptability of the envisioned approach, patients and clinicians expressed multiple concerns about AI systems that extract SDoH. Participants emphasized the need for high-quality data, non-intrusive presentation methods, and clear communication strategies regarding sensitive social needs. Findings underscore the importance of engaging patients and clinicians to mitigate unintended consequences when integrating AI approaches into care.</p><p><strong>Conclusion: </strong>Although AI approaches like SDoH autosuggest hold promise for efficiently identifying SDoH from clinical notes, they must also account for concerns of patients and clinicians to ensure these systems are acceptable and do not undermine trust.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"855-865"},"PeriodicalIF":4.7,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012364/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143626628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tom Arthur, Sophie Robinson, Samuel Vine, Lauren Asare, G J Melendez-Torres
{"title":"Equity implications of extended reality technologies for health and procedural anxiety: a systematic review and implementation-focused framework.","authors":"Tom Arthur, Sophie Robinson, Samuel Vine, Lauren Asare, G J Melendez-Torres","doi":"10.1093/jamia/ocaf047","DOIUrl":"10.1093/jamia/ocaf047","url":null,"abstract":"<p><strong>Objectives: </strong>Extended reality (XR) applications are gaining support as a method of reducing anxieties about medical treatments and conditions; however, their impacts on health service inequalities remain underresearched. We therefore undertook a synthesis of evidence relating to the equity implications of these types of interventions.</p><p><strong>Materials and methods: </strong>Searches of MEDLINE, Embase, APA PsycINFO, and Epistemonikos were conducted in May 2023 to identify reviews of patient-directed XR interventions for health and procedural anxiety. Equity-relevant data were extracted from records (n = 56) that met these criteria, and from individual trials (n = 63) evaluated within 5 priority reviews. Analyses deductively categorized data into salient situation- and technology-related mechanisms, which were then developed into a novel implementation-focused framework.</p><p><strong>Results: </strong>Analyses highlighted various mechanisms that impact on the availability, accessibility, and/or acceptability of services aiming to reduce patient health and procedural anxieties. On one hand, results showed that XR solutions offer unique opportunities for addressing health inequities, especially those concerning transport, cost, or mobility barriers. At the same time, however, these interventions can accelerate areas of inequity or even engender additional disparities.</p><p><strong>Discussion: </strong>Our \"double jeopardy, common impact\" framework outlines unique pathways through which XR could help address health disparities, but also accelerate or even generate inequity across different systems, communities, and individuals. This framework can be used to guide prospective interventions and assessments.</p><p><strong>Conclusion: </strong>Despite growing positive assertions about XR's capabilities for managing patient anxieties, we emphasize the need for taking a cautious, inclusive approach to implementation in future programs.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"945-957"},"PeriodicalIF":4.7,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012361/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143671450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rohan Khera, Mitsuaki Sawano, Frederick Warner, Andreas Coppi, Aline F Pedroso, Erica S Spatz, Huihui Yu, Michael Gottlieb, Sharon Saydah, Kari A Stephens, Kristin L Rising, Joann G Elmore, Mandy J Hill, Ahamed H Idris, Juan Carlos C Montoy, Kelli N O'Laughlin, Robert A Weinstein, Arjun Venkatesh
{"title":"Assessment of health conditions from patient electronic health record portals vs self-reported questionnaires: an analysis of the INSPIRE study.","authors":"Rohan Khera, Mitsuaki Sawano, Frederick Warner, Andreas Coppi, Aline F Pedroso, Erica S Spatz, Huihui Yu, Michael Gottlieb, Sharon Saydah, Kari A Stephens, Kristin L Rising, Joann G Elmore, Mandy J Hill, Ahamed H Idris, Juan Carlos C Montoy, Kelli N O'Laughlin, Robert A Weinstein, Arjun Venkatesh","doi":"10.1093/jamia/ocaf027","DOIUrl":"10.1093/jamia/ocaf027","url":null,"abstract":"<p><strong>Objectives: </strong>Direct electronic access to multiple electronic health record (EHR) systems through patient portals offers a novel avenue for decentralized research. Given the critical value of patient characterization, we sought to compare computable evaluation of health conditions from patient-portal EHR against the traditional self-report.</p><p><strong>Materials and methods: </strong>In the nationwide Innovative Support for Patients with SARS-CoV-2 Infections Registry (INSPIRE) study, which linked self-reported questionnaires with multiplatform patient-portal EHR data, we compared self-reported health conditions across different clinical domains against computable definitions based on diagnosis codes, medications, vital signs, and laboratory testing. We assessed their concordance using Cohen's Kappa and the prognostic significance of differentially captured features as predictors of 1-year all-cause hospitalization risk.</p><p><strong>Results: </strong>Among 1683 participants (mean age 41 ± 15 years, 67% female, 63% non-Hispanic Whites), the prevalence of conditions varied substantially between EHR and self-report (-13.2% to +11.6% across definitions). Compared with comprehensive EHR phenotypes, self-report under-captured all conditions, including hypertension (27.9% vs 16.2%), diabetes (10.1% vs 6.2%), and heart disease (8.5% vs 4.3%). However, diagnosis codes alone were insufficient. The risk for 1-year hospitalization was better defined by the same features from patient-portal EHR (area under the receiver operating curve [AUROC] 0.79) than from self-report (AUROC 0.68).</p><p><strong>Discussion: </strong>EHR-derived computable phenotypes identified a higher prevalence of comorbidities than self-report, with prognostic value of additionally identified features. However, definitions based solely on diagnosis codes often undercaptured self-reported conditions, suggesting a role of broader EHR elements.</p><p><strong>Conclusion: </strong>In this nationwide study, patient-portal-derived EHR data enabled extensive capture of patient characteristics across multiple EHR platforms, allowing better disease phenotyping compared with self-report.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"784-794"},"PeriodicalIF":4.7,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012333/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143558470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fahad Kamran, Donna Tjandra, Thomas S Valley, Hallie C Prescott, Nigam H Shah, Vincent X Liu, Eric Horvitz, Jenna Wiens
{"title":"Reformulating patient stratification for targeting interventions by accounting for severity of downstream outcomes resulting from disease onset: a case study in sepsis.","authors":"Fahad Kamran, Donna Tjandra, Thomas S Valley, Hallie C Prescott, Nigam H Shah, Vincent X Liu, Eric Horvitz, Jenna Wiens","doi":"10.1093/jamia/ocaf036","DOIUrl":"10.1093/jamia/ocaf036","url":null,"abstract":"<p><strong>Objectives: </strong>To quantify differences between (1) stratifying patients by predicted disease onset risk alone and (2) stratifying by predicted disease onset risk and severity of downstream outcomes. We perform a case study of predicting sepsis.</p><p><strong>Materials and methods: </strong>We performed a retrospective analysis using observational data from Michigan Medicine at the University of Michigan (U-M) between 2016 and 2020 and the Beth Israel Deaconess Medical Center (BIDMC) between 2008 and 2012. We measured the correlation between the estimated sepsis risk and the estimated effect of sepsis on mortality using Spearman's correlation. We compared patients stratified by sepsis risk with patients stratified by sepsis risk and effect of sepsis on mortality.</p><p><strong>Results: </strong>The U-M and BIDMC cohorts included 7282 and 5942 ICU visits; 7.9% and 8.1% developed sepsis, respectively. Among visits with sepsis, 21.9% and 26.3% experienced mortality at U-M and BIDMC. The effect of sepsis on mortality was weakly correlated with sepsis risk (U-M: 0.35 [95% CI: 0.33-0.37], BIDMC: 0.31 [95% CI: 0.28-0.34]). High-risk patients identified by both stratification approaches overlapped by 66.8% and 52.8% at U-M and BIDMC, respectively. Accounting for risk of mortality identified an older population (U-M: age = 66.0 [interquartile range-IQR: 55.0-74.0] vs age = 63.0 [IQR: 51.0-72.0], BIDMC: age = 74.0 [IQR: 61.0-83.0] vs age = 68.0 [IQR: 59.0-78.0]).</p><p><strong>Discussion: </strong>Predictive models that guide selective interventions ignore the effect of disease on downstream outcomes. Reformulating patient stratification to account for the estimated effect of disease on downstream outcomes identifies a different population compared to stratification on disease risk alone.</p><p><strong>Conclusion: </strong>Models that predict the risk of disease and ignore the effects of disease on downstream outcomes could be suboptimal for stratification.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"905-913"},"PeriodicalIF":4.7,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012354/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143701773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust privacy amidst innovation with large language models through a critical assessment of the risks.","authors":"Yao-Shun Chuang, Atiquer Rahman Sarkar, Yu-Chun Hsu, Noman Mohammed, Xiaoqian Jiang","doi":"10.1093/jamia/ocaf037","DOIUrl":"10.1093/jamia/ocaf037","url":null,"abstract":"<p><strong>Objective: </strong>This study evaluates the integration of electronic health records (EHRs) and natural language processing (NLP) with large language models (LLMs) to enhance healthcare data management and patient care, focusing on using advanced language models to create secure, Health Insurance Portability and Accountability Act-compliant synthetic patient notes for global biomedical research.</p><p><strong>Materials and methods: </strong>The study used de-identified and re-identified versions of the MIMIC III dataset with GPT-3.5, GPT-4, and Mistral 7B to generate synthetic clinical notes. Text generation employed templates and keyword extraction for contextually relevant notes, with One-shot generation for comparison. Privacy was assessed by analyzing protected health information (PHI) occurrence and co-occurrence, while utility was evaluated by training an ICD-9 coder using synthetic notes. Text quality was measured using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) and cosine similarity metrics to compare synthetic notes with source notes for semantic similarity.</p><p><strong>Results: </strong>The analysis of PHI occurrence and text utility via the ICD-9 coding task showed that the keyword-based method had low risk and good performance. One-shot generation exhibited the highest PHI exposure and PHI co-occurrence, particularly in geographic location and date categories. The Normalized One-shot method achieved the highest classification accuracy. Re-identified data consistently outperformed de-identified data.</p><p><strong>Discussion: </strong>Privacy analysis revealed a critical balance between data utility and privacy protection, influencing future data use and sharing.</p><p><strong>Conclusion: </strong>This study shows that keyword-based methods can create synthetic clinical notes that protect privacy while retaining data usability, potentially improving clinical data sharing. The use of dummy PHIs to counter privacy attacks may offer better utility and privacy than traditional de-identification.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"885-892"},"PeriodicalIF":4.7,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012348/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143671455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}