Journal of the American Medical Informatics Association最新文献

筛选
英文 中文
Large language models are less effective at clinical prediction tasks than locally trained machine learning models. 在临床预测任务中,大型语言模型不如本地训练的机器学习模型有效。
IF 4.7 2区 医学
Journal of the American Medical Informatics Association Pub Date : 2025-05-01 DOI: 10.1093/jamia/ocaf038
Katherine E Brown, Chao Yan, Zhuohang Li, Xinmeng Zhang, Benjamin X Collins, You Chen, Ellen Wright Clayton, Murat Kantarcioglu, Yevgeniy Vorobeychik, Bradley A Malin
{"title":"Large language models are less effective at clinical prediction tasks than locally trained machine learning models.","authors":"Katherine E Brown, Chao Yan, Zhuohang Li, Xinmeng Zhang, Benjamin X Collins, You Chen, Ellen Wright Clayton, Murat Kantarcioglu, Yevgeniy Vorobeychik, Bradley A Malin","doi":"10.1093/jamia/ocaf038","DOIUrl":"10.1093/jamia/ocaf038","url":null,"abstract":"<p><strong>Objectives: </strong>To determine the extent to which current large language models (LLMs) can serve as substitutes for traditional machine learning (ML) as clinical predictors using data from electronic health records (EHRs), we investigated various factors that can impact their adoption, including overall performance, calibration, fairness, and resilience to privacy protections that reduce data fidelity.</p><p><strong>Materials and methods: </strong>We evaluated GPT-3.5, GPT-4, and traditional ML (as gradient-boosting trees) on clinical prediction tasks in EHR data from Vanderbilt University Medical Center (VUMC) and MIMIC IV. We measured predictive performance with area under the receiver operating characteristic (AUROC) and model calibration using Brier Score. To evaluate the impact of data privacy protections, we assessed AUROC when demographic variables are generalized. We evaluated algorithmic fairness using equalized odds and statistical parity across race, sex, and age of patients. We also considered the impact of using in-context learning by incorporating labeled examples within the prompt.</p><p><strong>Results: </strong>Traditional ML [AUROC: 0.847, 0.894 (VUMC, MIMIC)] substantially outperformed GPT-3.5 (AUROC: 0.537, 0.517) and GPT-4 (AUROC: 0.629, 0.602) (with and without in-context learning) in predictive performance and output probability calibration [Brier Score (ML vs GPT-3.5 vs GPT-4): 0.134 vs 0.384 vs 0.251, 0.042 vs 0.06 vs 0.219)].</p><p><strong>Discussion: </strong>Traditional ML is more robust than GPT-3.5 and GPT-4 in generalizing demographic information to protect privacy. GPT-4 is the fairest model according to our selected metrics but at the cost of poor model performance.</p><p><strong>Conclusion: </strong>These findings suggest that non-fine-tuned LLMs are less effective and robust than locally trained ML for clinical prediction tasks, but they are improving across releases.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"811-822"},"PeriodicalIF":4.7,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012369/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143582390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reformulating patient stratification for targeting interventions by accounting for severity of downstream outcomes resulting from disease onset: a case study in sepsis. 通过考虑由疾病发作引起的下游结果的严重程度,重新制定针对干预措施的患者分层:败血症的案例研究。
IF 4.7 2区 医学
Journal of the American Medical Informatics Association Pub Date : 2025-05-01 DOI: 10.1093/jamia/ocaf036
Fahad Kamran, Donna Tjandra, Thomas S Valley, Hallie C Prescott, Nigam H Shah, Vincent X Liu, Eric Horvitz, Jenna Wiens
{"title":"Reformulating patient stratification for targeting interventions by accounting for severity of downstream outcomes resulting from disease onset: a case study in sepsis.","authors":"Fahad Kamran, Donna Tjandra, Thomas S Valley, Hallie C Prescott, Nigam H Shah, Vincent X Liu, Eric Horvitz, Jenna Wiens","doi":"10.1093/jamia/ocaf036","DOIUrl":"10.1093/jamia/ocaf036","url":null,"abstract":"<p><strong>Objectives: </strong>To quantify differences between (1) stratifying patients by predicted disease onset risk alone and (2) stratifying by predicted disease onset risk and severity of downstream outcomes. We perform a case study of predicting sepsis.</p><p><strong>Materials and methods: </strong>We performed a retrospective analysis using observational data from Michigan Medicine at the University of Michigan (U-M) between 2016 and 2020 and the Beth Israel Deaconess Medical Center (BIDMC) between 2008 and 2012. We measured the correlation between the estimated sepsis risk and the estimated effect of sepsis on mortality using Spearman's correlation. We compared patients stratified by sepsis risk with patients stratified by sepsis risk and effect of sepsis on mortality.</p><p><strong>Results: </strong>The U-M and BIDMC cohorts included 7282 and 5942 ICU visits; 7.9% and 8.1% developed sepsis, respectively. Among visits with sepsis, 21.9% and 26.3% experienced mortality at U-M and BIDMC. The effect of sepsis on mortality was weakly correlated with sepsis risk (U-M: 0.35 [95% CI: 0.33-0.37], BIDMC: 0.31 [95% CI: 0.28-0.34]). High-risk patients identified by both stratification approaches overlapped by 66.8% and 52.8% at U-M and BIDMC, respectively. Accounting for risk of mortality identified an older population (U-M: age = 66.0 [interquartile range-IQR: 55.0-74.0] vs age = 63.0 [IQR: 51.0-72.0], BIDMC: age = 74.0 [IQR: 61.0-83.0] vs age = 68.0 [IQR: 59.0-78.0]).</p><p><strong>Discussion: </strong>Predictive models that guide selective interventions ignore the effect of disease on downstream outcomes. Reformulating patient stratification to account for the estimated effect of disease on downstream outcomes identifies a different population compared to stratification on disease risk alone.</p><p><strong>Conclusion: </strong>Models that predict the risk of disease and ignore the effects of disease on downstream outcomes could be suboptimal for stratification.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"905-913"},"PeriodicalIF":4.7,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012354/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143701773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust privacy amidst innovation with large language models through a critical assessment of the risks. 通过对风险的严格评估,在大型语言模型的创新中实现健壮的隐私。
IF 4.7 2区 医学
Journal of the American Medical Informatics Association Pub Date : 2025-05-01 DOI: 10.1093/jamia/ocaf037
Yao-Shun Chuang, Atiquer Rahman Sarkar, Yu-Chun Hsu, Noman Mohammed, Xiaoqian Jiang
{"title":"Robust privacy amidst innovation with large language models through a critical assessment of the risks.","authors":"Yao-Shun Chuang, Atiquer Rahman Sarkar, Yu-Chun Hsu, Noman Mohammed, Xiaoqian Jiang","doi":"10.1093/jamia/ocaf037","DOIUrl":"10.1093/jamia/ocaf037","url":null,"abstract":"<p><strong>Objective: </strong>This study evaluates the integration of electronic health records (EHRs) and natural language processing (NLP) with large language models (LLMs) to enhance healthcare data management and patient care, focusing on using advanced language models to create secure, Health Insurance Portability and Accountability Act-compliant synthetic patient notes for global biomedical research.</p><p><strong>Materials and methods: </strong>The study used de-identified and re-identified versions of the MIMIC III dataset with GPT-3.5, GPT-4, and Mistral 7B to generate synthetic clinical notes. Text generation employed templates and keyword extraction for contextually relevant notes, with One-shot generation for comparison. Privacy was assessed by analyzing protected health information (PHI) occurrence and co-occurrence, while utility was evaluated by training an ICD-9 coder using synthetic notes. Text quality was measured using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) and cosine similarity metrics to compare synthetic notes with source notes for semantic similarity.</p><p><strong>Results: </strong>The analysis of PHI occurrence and text utility via the ICD-9 coding task showed that the keyword-based method had low risk and good performance. One-shot generation exhibited the highest PHI exposure and PHI co-occurrence, particularly in geographic location and date categories. The Normalized One-shot method achieved the highest classification accuracy. Re-identified data consistently outperformed de-identified data.</p><p><strong>Discussion: </strong>Privacy analysis revealed a critical balance between data utility and privacy protection, influencing future data use and sharing.</p><p><strong>Conclusion: </strong>This study shows that keyword-based methods can create synthetic clinical notes that protect privacy while retaining data usability, potentially improving clinical data sharing. The use of dummy PHIs to counter privacy attacks may offer better utility and privacy than traditional de-identification.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"885-892"},"PeriodicalIF":4.7,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012348/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143671455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessment of health conditions from patient electronic health record portals vs self-reported questionnaires: an analysis of the INSPIRE study. 来自患者电子健康记录门户与自我报告问卷的健康状况评估:INSPIRE研究的分析
IF 4.7 2区 医学
Journal of the American Medical Informatics Association Pub Date : 2025-05-01 DOI: 10.1093/jamia/ocaf027
Rohan Khera, Mitsuaki Sawano, Frederick Warner, Andreas Coppi, Aline F Pedroso, Erica S Spatz, Huihui Yu, Michael Gottlieb, Sharon Saydah, Kari A Stephens, Kristin L Rising, Joann G Elmore, Mandy J Hill, Ahamed H Idris, Juan Carlos C Montoy, Kelli N O'Laughlin, Robert A Weinstein, Arjun Venkatesh
{"title":"Assessment of health conditions from patient electronic health record portals vs self-reported questionnaires: an analysis of the INSPIRE study.","authors":"Rohan Khera, Mitsuaki Sawano, Frederick Warner, Andreas Coppi, Aline F Pedroso, Erica S Spatz, Huihui Yu, Michael Gottlieb, Sharon Saydah, Kari A Stephens, Kristin L Rising, Joann G Elmore, Mandy J Hill, Ahamed H Idris, Juan Carlos C Montoy, Kelli N O'Laughlin, Robert A Weinstein, Arjun Venkatesh","doi":"10.1093/jamia/ocaf027","DOIUrl":"10.1093/jamia/ocaf027","url":null,"abstract":"<p><strong>Objectives: </strong>Direct electronic access to multiple electronic health record (EHR) systems through patient portals offers a novel avenue for decentralized research. Given the critical value of patient characterization, we sought to compare computable evaluation of health conditions from patient-portal EHR against the traditional self-report.</p><p><strong>Materials and methods: </strong>In the nationwide Innovative Support for Patients with SARS-CoV-2 Infections Registry (INSPIRE) study, which linked self-reported questionnaires with multiplatform patient-portal EHR data, we compared self-reported health conditions across different clinical domains against computable definitions based on diagnosis codes, medications, vital signs, and laboratory testing. We assessed their concordance using Cohen's Kappa and the prognostic significance of differentially captured features as predictors of 1-year all-cause hospitalization risk.</p><p><strong>Results: </strong>Among 1683 participants (mean age 41 ± 15 years, 67% female, 63% non-Hispanic Whites), the prevalence of conditions varied substantially between EHR and self-report (-13.2% to +11.6% across definitions). Compared with comprehensive EHR phenotypes, self-report under-captured all conditions, including hypertension (27.9% vs 16.2%), diabetes (10.1% vs 6.2%), and heart disease (8.5% vs 4.3%). However, diagnosis codes alone were insufficient. The risk for 1-year hospitalization was better defined by the same features from patient-portal EHR (area under the receiver operating curve [AUROC] 0.79) than from self-report (AUROC 0.68).</p><p><strong>Discussion: </strong>EHR-derived computable phenotypes identified a higher prevalence of comorbidities than self-report, with prognostic value of additionally identified features. However, definitions based solely on diagnosis codes often undercaptured self-reported conditions, suggesting a role of broader EHR elements.</p><p><strong>Conclusion: </strong>In this nationwide study, patient-portal-derived EHR data enabled extensive capture of patient characteristics across multiple EHR platforms, allowing better disease phenotyping compared with self-report.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"784-794"},"PeriodicalIF":4.7,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012333/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143558470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mitigation of outcome conflation in predicting patient outcomes using electronic health records. 在使用电子健康记录预测患者预后时减少预后混淆。
IF 4.7 2区 医学
Journal of the American Medical Informatics Association Pub Date : 2025-05-01 DOI: 10.1093/jamia/ocaf033
S Momsen Reincke, Camilo Espinosa, Philip Chung, Tomin James, Eloïse Berson, Nima Aghaeepour
{"title":"Mitigation of outcome conflation in predicting patient outcomes using electronic health records.","authors":"S Momsen Reincke, Camilo Espinosa, Philip Chung, Tomin James, Eloïse Berson, Nima Aghaeepour","doi":"10.1093/jamia/ocaf033","DOIUrl":"10.1093/jamia/ocaf033","url":null,"abstract":"<p><strong>Objectives: </strong>Artificial intelligence (AI) models utilizing electronic health record data for disease prediction can enhance risk stratification but may lack specificity, which is crucial for reducing the economic and psychological burdens associated with false positives. This study aims to evaluate the impact of confounders on the specificity of single-outcome prediction models and assess the effectiveness of a multi-class architecture in mitigating outcome conflation.</p><p><strong>Materials and methods: </strong>We evaluated a state-of-the-art model predicting pancreatic cancer from disease code sequences in an independent cohort of 2.3 million patients and compared this single-outcome model with a multi-class model designed to predict multiple cancer types simultaneously. Additionally, we conducted a clinical simulation experiment to investigate the impact of confounders on the specificity of single-outcome prediction models.</p><p><strong>Results: </strong>While we were able to independently validate the pancreatic cancer prediction model, we found that its prediction scores were also correlated with ovarian cancer, suggesting conflation of outcomes due to underlying confounders. Building on this observation, we demonstrate that the specificity of single-outcome prediction models is impaired by confounders using a clinical simulation experiment. Introducing a multi-class architecture improves specificity in predicting cancer types compared to the single-outcome model while preserving performance, mitigating the conflation of outcomes in both the real-world and simulated contexts.</p><p><strong>Discussion: </strong>Our results highlight the risk of outcome conflation in single-outcome AI prediction models and demonstrate the effectiveness of a multi-class approach in mitigating this issue.</p><p><strong>Conclusion: </strong>The number of predicted outcomes needs to be carefully considered when employing AI disease risk prediction models.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"920-927"},"PeriodicalIF":4.7,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012356/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143582391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: Inpatient nurses' preferences and decisions with risk information visualization. 更正:住院护士对风险信息可视化的偏好和决定。
IF 4.7 2区 医学
Journal of the American Medical Informatics Association Pub Date : 2025-05-01 DOI: 10.1093/jamia/ocaf028
{"title":"Correction to: Inpatient nurses' preferences and decisions with risk information visualization.","authors":"","doi":"10.1093/jamia/ocaf028","DOIUrl":"10.1093/jamia/ocaf028","url":null,"abstract":"","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"980"},"PeriodicalIF":4.7,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012332/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143442478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: Development and evaluation of a training curriculum to engage researchers on accessing and analyzing the All of Us data. 更正:开发和评估培训课程,使研究人员参与访问和分析“我们所有人”数据。
IF 4.7 2区 医学
Journal of the American Medical Informatics Association Pub Date : 2025-05-01 DOI: 10.1093/jamia/ocaf044
{"title":"Correction to: Development and evaluation of a training curriculum to engage researchers on accessing and analyzing the All of Us data.","authors":"","doi":"10.1093/jamia/ocaf044","DOIUrl":"10.1093/jamia/ocaf044","url":null,"abstract":"","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"981"},"PeriodicalIF":4.7,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012360/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143574590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Diversity, equity, and inclusion matter for biomedical and health informatics. 多样性、公平性和包容性对生物医学和健康信息学至关重要。
IF 4.7 2区 医学
Journal of the American Medical Informatics Association Pub Date : 2025-05-01 DOI: 10.1093/jamia/ocaf057
Suzanne Bakken
{"title":"Diversity, equity, and inclusion matter for biomedical and health informatics.","authors":"Suzanne Bakken","doi":"10.1093/jamia/ocaf057","DOIUrl":"https://doi.org/10.1093/jamia/ocaf057","url":null,"abstract":"","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":"32 5","pages":"773-774"},"PeriodicalIF":4.7,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012337/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144043337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Emerging algorithmic bias: fairness drift as the next dimension of model maintenance and sustainability. 新出现的算法偏差:公平性漂移是模型维护和可持续性的下一个维度。
IF 4.7 2区 医学
Journal of the American Medical Informatics Association Pub Date : 2025-05-01 DOI: 10.1093/jamia/ocaf039
Sharon E Davis, Chad Dorn, Daniel J Park, Michael E Matheny
{"title":"Emerging algorithmic bias: fairness drift as the next dimension of model maintenance and sustainability.","authors":"Sharon E Davis, Chad Dorn, Daniel J Park, Michael E Matheny","doi":"10.1093/jamia/ocaf039","DOIUrl":"10.1093/jamia/ocaf039","url":null,"abstract":"<p><strong>Objectives: </strong>While performance drift of clinical prediction models is well-documented, the potential for algorithmic biases to emerge post-deployment has had limited characterization. A better understanding of how temporal model performance may shift across subpopulations is required to incorporate fairness drift into model maintenance strategies.</p><p><strong>Materials and methods: </strong>We explore fairness drift in a national population over 11 years, with and without model maintenance aimed at sustaining population-level performance. We trained random forest models predicting 30-day post-surgical readmission, mortality, and pneumonia using 2013 data from US Department of Veterans Affairs facilities. We evaluated performance quarterly from 2014 to 2023 by self-reported race and sex. We estimated discrimination, calibration, and accuracy, and operationalized fairness using metric parity measured as the gap between disadvantaged and advantaged groups.</p><p><strong>Results: </strong>Our cohort included 1 739 666 surgical cases. We observed fairness drift in both the original and temporally updated models. Model updating had a larger impact on overall performance than fairness gaps. During periods of stable fairness, updating models at the population level increased, decreased, or did not impact fairness gaps. During periods of fairness drift, updating models restored fairness in some cases and exacerbated fairness gaps in others.</p><p><strong>Discussion: </strong>This exploratory study highlights that algorithmic fairness cannot be assured through one-time assessments during model development. Temporal changes in fairness may take multiple forms and interact with model updating strategies in unanticipated ways.</p><p><strong>Conclusion: </strong>Equitable and sustainable clinical artificial intelligence deployments will require novel methods to monitor algorithmic fairness, detect emerging bias, and adopt model updates that promote fairness.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"845-854"},"PeriodicalIF":4.7,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012346/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143626626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Utilizing large language models for detecting hospital-acquired conditions: an empirical study on pulmonary embolism. 利用大语言模型检测医院获得性疾病:肺栓塞的实证研究。
IF 4.7 2区 医学
Journal of the American Medical Informatics Association Pub Date : 2025-05-01 DOI: 10.1093/jamia/ocaf048
Cheligeer Cheligeer, Danielle A Southern, Jun Yan, Guosong Wu, Jie Pan, Seungwon Lee, Elliot A Martin, Hamed Jafarpour, Cathy A Eastwood, Yong Zeng, Hude Quan
{"title":"Utilizing large language models for detecting hospital-acquired conditions: an empirical study on pulmonary embolism.","authors":"Cheligeer Cheligeer, Danielle A Southern, Jun Yan, Guosong Wu, Jie Pan, Seungwon Lee, Elliot A Martin, Hamed Jafarpour, Cathy A Eastwood, Yong Zeng, Hude Quan","doi":"10.1093/jamia/ocaf048","DOIUrl":"10.1093/jamia/ocaf048","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Objectives: &lt;/strong&gt;Adverse event detection from Electronic Medical Records (EMRs) is challenging due to the low incidence of the event, variability in clinical documentation, and the complexity of data formats. Pulmonary embolism as an adverse event (PEAE) is particularly difficult to identify using existing approaches. This study aims to develop and evaluate a Large Language Model (LLM)-based framework for detecting PEAE from unstructured narrative data in EMRs.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Materials and methods: &lt;/strong&gt;We conducted a chart review of adult patients (aged 18-100) admitted to tertiary-care hospitals in Calgary, Alberta, Canada, between 2017-2022. We developed an LLM-based detection framework consisting of three modules: evidence extraction (implementing both keyword-based and semantic similarity-based filtering methods), discharge information extraction (focusing on six key clinical sections), and PEAE detection. Four open-source LLMs (Llama3, Mistral-7B, Gemma, and Phi-3) were evaluated using positive predictive value, sensitivity, specificity, and F1-score. Model performance for population-level surveillance was assessed at yearly, quarterly, and monthly granularities.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;The chart review included 10 066 patients, with 40 cases of PEAE identified (0.4% prevalence). All four LLMs demonstrated high sensitivity (87.5-100%) and specificity (94.9-98.9%) across different experimental conditions. Gemma achieved the highest F1-score (28.11%) using keyword-based retrieval with discharge summary inclusion, along with 98.4% specificity, 87.5% sensitivity, and 99.95% negative predictive value. Keyword-based filtering reduced the median chunks per patient from 789 to 310, while semantic filtering further reduced this to 9 chunks. Including discharge summaries improved performance metrics across most models. For population-level surveillance, all models showed strong correlation with actual PEAE trends at yearly granularity (r=0.92-0.99), with Llama3 achieving the highest correlation (0.988).&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Discussion: &lt;/strong&gt;The results of our method for PEAE detection using EMR notes demonstrate high sensitivity and specificity across all four tested LLMs, indicating strong performance in distinguishing PEAE from non-PEAE cases. However, the low incidence rate of PEAE contributed to a lower PPV. The keyword-based chunking approach consistently outperformed semantic similarity-based methods, achieving higher F1 scores and PPV, underscoring the importance of domain knowledge in text segmentation. Including discharge summaries further enhanced performance metrics. Our population-based analysis revealed better performance for yearly trends compared to monthly granularity, suggesting the framework's utility for long-term surveillance despite dataset imbalance. Error analysis identified contextual misinterpretation, terminology confusion, and preprocessing limitations as key challenges for future improvement.&lt;/p&gt;&lt;p&gt;&lt;","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"876-884"},"PeriodicalIF":4.7,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012340/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143659571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信