Joshua Trujeque, R Adams Dudley, Nathan Mesfin, Nicholas E Ingraham, Isai Ortiz, Ann Bangerter, Anjan Chakraborty, Dalton Schutte, Jeremy Yeung, Ying Liu, Alicia Woodward-Abel, Emma Bromley, Rui Zhang, Lisa A Brenner, Joseph A Simonetti
{"title":"Comparison of six natural language processing approaches to assessing firearm access in Veterans Health Administration electronic health records.","authors":"Joshua Trujeque, R Adams Dudley, Nathan Mesfin, Nicholas E Ingraham, Isai Ortiz, Ann Bangerter, Anjan Chakraborty, Dalton Schutte, Jeremy Yeung, Ying Liu, Alicia Woodward-Abel, Emma Bromley, Rui Zhang, Lisa A Brenner, Joseph A Simonetti","doi":"10.1093/jamia/ocae169","DOIUrl":"10.1093/jamia/ocae169","url":null,"abstract":"<p><strong>Objective: </strong>Access to firearms is associated with increased suicide risk. Our aim was to develop a natural language processing approach to characterizing firearm access in clinical records.</p><p><strong>Materials and methods: </strong>We used clinical notes from 36 685 Veterans Health Administration (VHA) patients between April 10, 2023 and April 10, 2024. We expanded preexisting firearm term sets using subject matter experts and generated 250-character snippets around each firearm term appearing in notes. Annotators labeled 3000 snippets into three classes. Using these annotated snippets, we compared four nonneural machine learning models (random forest, bagging, gradient boosting, logistic regression with ridge penalization) and two versions of Bidirectional Encoder Representations from Transformers, or BERT (specifically, BioBERT and Bio-ClinicalBERT) for classifying firearm access as \"definite access\", \"definitely no access\", or \"other\".</p><p><strong>Results: </strong>Firearm terms were identified in 36 685 patient records (41.3%), 33.7% of snippets were categorized as definite access, 9.0% as definitely no access, and 57.2% as \"other\". Among models classifying firearm access, five of six had acceptable performance, with BioBERT and Bio-ClinicalBERT performing best, with F1s of 0.876 (95% confidence interval, 0.874-0.879) and 0.896 (95% confidence interval, 0.894-0.899), respectively.</p><p><strong>Discussion and conclusion: </strong>Firearm-related terminology is common in the clinical records of VHA patients. The ability to use text to identify and characterize patients' firearm access could enhance suicide prevention efforts, and five of our six models could be used to identify patients for clinical interventions.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"113-118"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648724/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Betina Idnay, Gongbo Zhang, Fangyi Chen, Casey N Ta, Matthew W Schelke, Karen Marder, Chunhua Weng
{"title":"Mini-mental status examination phenotyping for Alzheimer's disease patients using both structured and narrative electronic health record features.","authors":"Betina Idnay, Gongbo Zhang, Fangyi Chen, Casey N Ta, Matthew W Schelke, Karen Marder, Chunhua Weng","doi":"10.1093/jamia/ocae274","DOIUrl":"10.1093/jamia/ocae274","url":null,"abstract":"<p><strong>Objective: </strong>This study aims to automate the prediction of Mini-Mental State Examination (MMSE) scores, a widely adopted standard for cognitive assessment in patients with Alzheimer's disease, using natural language processing (NLP) and machine learning (ML) on structured and unstructured EHR data.</p><p><strong>Materials and methods: </strong>We extracted demographic data, diagnoses, medications, and unstructured clinical visit notes from the EHRs. We used Latent Dirichlet Allocation (LDA) for topic modeling and Term-Frequency Inverse Document Frequency (TF-IDF) for n-grams. In addition, we extracted meta-features such as age, ethnicity, and race. Model training and evaluation employed eXtreme Gradient Boosting (XGBoost), Stochastic Gradient Descent Regressor (SGDRegressor), and Multi-Layer Perceptron (MLP).</p><p><strong>Results: </strong>We analyzed 1654 clinical visit notes collected between September 2019 and June 2023 for 1000 Alzheimer's disease patients. The average MMSE score was 20, with patients averaging 76.4 years old, 54.7% female, and 54.7% identifying as White. The best-performing model (ie, lowest root mean squared error (RMSE)) is MLP, which achieved an RMSE of 5.53 on the validation set using n-grams, indicating superior prediction performance over other models and feature sets. The RMSE on the test set was 5.85.</p><p><strong>Discussion: </strong>This study developed a ML method to predict MMSE scores from unstructured clinical notes, demonstrating the feasibility of utilizing NLP to support cognitive assessment. Future work should focus on refining the model and evaluating its clinical relevance across diverse settings.</p><p><strong>Conclusion: </strong>We contributed a model for automating MMSE estimation using EHR features, potentially transforming cognitive assessment for Alzheimer's patients and paving the way for more informed clinical decisions and cohort identification.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"119-128"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648712/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Savage, John Wang, Robert Gallo, Abdessalem Boukil, Vishwesh Patel, Seyed Amir Ahmad Safavi-Naini, Ali Soroush, Jonathan H Chen
{"title":"Large language model uncertainty proxies: discrimination and calibration for medical diagnosis and treatment.","authors":"Thomas Savage, John Wang, Robert Gallo, Abdessalem Boukil, Vishwesh Patel, Seyed Amir Ahmad Safavi-Naini, Ali Soroush, Jonathan H Chen","doi":"10.1093/jamia/ocae254","DOIUrl":"10.1093/jamia/ocae254","url":null,"abstract":"<p><strong>Introduction: </strong>The inability of large language models (LLMs) to communicate uncertainty is a significant barrier to their use in medicine. Before LLMs can be integrated into patient care, the field must assess methods to estimate uncertainty in ways that are useful to physician-users.</p><p><strong>Objective: </strong>Evaluate the ability for uncertainty proxies to quantify LLM confidence when performing diagnosis and treatment selection tasks by assessing the properties of discrimination and calibration.</p><p><strong>Methods: </strong>We examined confidence elicitation (CE), token-level probability (TLP), and sample consistency (SC) proxies across GPT3.5, GPT4, Llama2, and Llama3. Uncertainty proxies were evaluated against 3 datasets of open-ended patient scenarios.</p><p><strong>Results: </strong>SC discrimination outperformed TLP and CE methods. SC by sentence embedding achieved the highest discriminative performance (ROC AUC 0.68-0.79), yet with poor calibration. SC by GPT annotation achieved the second-best discrimination (ROC AUC 0.66-0.74) with accurate calibration. Verbalized confidence (CE) was found to consistently overestimate model confidence.</p><p><strong>Discussion and conclusions: </strong>SC is the most effective method for estimating LLM uncertainty of the proxies evaluated. SC by sentence embedding can effectively estimate uncertainty if the user has a set of reference cases with which to re-calibrate their results, while SC by GPT annotation is the more effective method if the user does not have reference cases and requires accurate raw calibration. Our results confirm LLMs are consistently over-confident when verbalizing their confidence (CE).</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"139-149"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648734/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142479235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrew J Zimolzak, Sundas P Khan, Hardeep Singh, Jessica A Davila
{"title":"Application of a digital quality measure for cancer diagnosis in Epic Cosmos.","authors":"Andrew J Zimolzak, Sundas P Khan, Hardeep Singh, Jessica A Davila","doi":"10.1093/jamia/ocae253","DOIUrl":"10.1093/jamia/ocae253","url":null,"abstract":"<p><strong>Objectives: </strong>Missed and delayed cancer diagnoses are common, harmful, and often preventable. We previously validated a digital quality measure (dQM) of emergency presentation (EP) of lung cancer in 2 US health systems. This study aimed to apply the dQM to a new national electronic health record (EHR) database and examine demographic associations.</p><p><strong>Materials and methods: </strong>We applied the dQM (emergency encounter followed by new lung cancer diagnosis within 30 days) to Epic Cosmos, a deidentified database covering 184 million US patients. We examined dQM associations with sociodemographic factors.</p><p><strong>Results: </strong>The overall EP rate was 19.6%. EP rate was higher in Black vs White patients (24% vs 19%, P < .001) and patients with younger age, higher social vulnerability, lower-income ZIP code, and self-reported transport difficulties.</p><p><strong>Discussion: </strong>We successfully applied a dQM based on cancer EP to the largest US EHR database.</p><p><strong>Conclusion: </strong>This dQM could be a marker for sociodemographic vulnerabilities in cancer diagnosis.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"227-229"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648705/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142479220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aaron S Eisman, Elizabeth S Chen, Wen-Chih Wu, Karen M Crowley, Dilum P Aluthge, Katherine Brown, Indra Neil Sarkar
{"title":"Learning health system linchpins: information exchange and a common data model.","authors":"Aaron S Eisman, Elizabeth S Chen, Wen-Chih Wu, Karen M Crowley, Dilum P Aluthge, Katherine Brown, Indra Neil Sarkar","doi":"10.1093/jamia/ocae277","DOIUrl":"10.1093/jamia/ocae277","url":null,"abstract":"<p><strong>Objective: </strong>To demonstrate the potential for a centrally managed health information exchange standardized to a common data model (HIE-CDM) to facilitate semantic data flow needed to support a learning health system (LHS).</p><p><strong>Materials and methods: </strong>The Rhode Island Quality Institute operates the Rhode Island (RI) statewide HIE, which aggregates RI health data for more than half of the state's population from 47 data partners. We standardized HIE data to the Observational Medical Outcomes Partnership (OMOP) CDM. Atherosclerotic cardiovascular disease (ASCVD) risk and primary prevention practices were selected to demonstrate LHS semantic data flow from 2013 to 2023.</p><p><strong>Results: </strong>We calculated longitudinal 10-year ASCVD risk on 62,999 individuals. Nearly two-thirds had ASCVD risk factors from more than one data partner. This enabled granular tracking of individual ASCVD risk, primary prevention (ie, statin therapy), and incident disease. The population was on statins for fewer than half of the guideline-recommended days. We also found that individuals receiving care at Federally Qualified Health Centers were more likely to have unfavorable ASCVD risk profiles and more likely to be on statins. CDM transformation reduced data heterogeneity through a unified health record that adheres to defined terminologies per OMOP domain.</p><p><strong>Discussion: </strong>We demonstrated the potential for an HIE-CDM to enable observational population health research. We also showed how to leverage existing health information technology infrastructure and health data best practices to break down LHS barriers.</p><p><strong>Conclusion: </strong>HIE-CDM facilitates knowledge curation and health system intervention development at the individual, health system, and population levels.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"9-19"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648737/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xubing Hao, Xiaojin Li, Yan Huang, Jay Shi, Rashmie Abeysinghe, Cui Tao, Kirk Roberts, Guo-Qiang Zhang, Licong Cui
{"title":"Quantitatively assessing the impact of the quality of SNOMED CT subtype hierarchy on cohort queries.","authors":"Xubing Hao, Xiaojin Li, Yan Huang, Jay Shi, Rashmie Abeysinghe, Cui Tao, Kirk Roberts, Guo-Qiang Zhang, Licong Cui","doi":"10.1093/jamia/ocae272","DOIUrl":"10.1093/jamia/ocae272","url":null,"abstract":"<p><strong>Objective: </strong>SNOMED CT provides a standardized terminology for clinical concepts, allowing cohort queries over heterogeneous clinical data including Electronic Health Records (EHRs). While it is intuitive that missing and inaccurate subtype (or is-a) relations in SNOMED CT reduce the recall and precision of cohort queries, the extent of these impacts has not been formally assessed. This study fills this gap by developing quantitative metrics to measure these impacts and performing statistical analysis on their significance.</p><p><strong>Material and methods: </strong>We used the Optum de-identified COVID-19 Electronic Health Record dataset. We defined micro-averaged and macro-averaged recall and precision metrics to assess the impact of missing and inaccurate is-a relations on cohort queries. Both practical and simulated analyses were performed. Practical analyses involved 407 missing and 48 inaccurate is-a relations confirmed by domain experts, with statistical testing using Wilcoxon signed-rank tests. Simulated analyses used two random sets of 400 is-a relations to simulate missing and inaccurate is-a relations.</p><p><strong>Results: </strong>Wilcoxon signed-rank tests from both practical and simulated analyses (P-values < .001) showed that missing is-a relations significantly reduced the micro- and macro-averaged recall, and inaccurate is-a relations significantly reduced the micro- and macro-averaged precision.</p><p><strong>Discussion: </strong>The introduced impact metrics can assist SNOMED CT maintainers in prioritizing critical hierarchical defects for quality enhancement. These metrics are generally applicable for assessing the quality impact of a terminology's subtype hierarchy on its cohort query applications.</p><p><strong>Conclusion: </strong>Our results indicate a significant impact of missing and inaccurate is-a relations in SNOMED CT on the recall and precision of cohort queries. Our work highlights the importance of high-quality terminology hierarchy for cohort queries over EHR data and provides valuable insights for prioritizing quality improvements of SNOMED CT's hierarchy.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"89-96"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648736/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advancing a learning health system through biomedical and health informatics.","authors":"Suzanne Bakken","doi":"10.1093/jamia/ocae307","DOIUrl":"10.1093/jamia/ocae307","url":null,"abstract":"","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":"32 1","pages":"1-2"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648707/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142839965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anna Northrop, Anika Christofferson, Saumya Umashankar, Michelle Melisko, Paolo Castillo, Thelma Brown, Diane Heditsian, Susie Brain, Carol Simmons, Tina Hieken, Kathryn J Ruddy, Candace Mainor, Anosheh Afghahi, Sarah Tevis, Anne Blaes, Irene Kang, Adam Asare, Laura Esserman, Dawn L Hershman, Amrita Basu
{"title":"Implementation and impact of an electronic patient reported outcomes system in a phase II multi-site adaptive platform clinical trial for early-stage breast cancer.","authors":"Anna Northrop, Anika Christofferson, Saumya Umashankar, Michelle Melisko, Paolo Castillo, Thelma Brown, Diane Heditsian, Susie Brain, Carol Simmons, Tina Hieken, Kathryn J Ruddy, Candace Mainor, Anosheh Afghahi, Sarah Tevis, Anne Blaes, Irene Kang, Adam Asare, Laura Esserman, Dawn L Hershman, Amrita Basu","doi":"10.1093/jamia/ocae190","DOIUrl":"10.1093/jamia/ocae190","url":null,"abstract":"<p><strong>Objectives: </strong>We describe the development and implementation of a system for monitoring patient-reported adverse events and quality of life using electronic Patient Reported Outcome (ePRO) instruments in the I-SPY2 Trial, a phase II clinical trial for locally advanced breast cancer. We describe the administration of technological, workflow, and behavior change interventions and their associated impact on questionnaire completion.</p><p><strong>Materials and methods: </strong>Using the OpenClinica electronic data capture system, we developed rules-based logic to build automated ePRO surveys, customized to the I-SPY2 treatment schedule. We piloted ePROs at the University of California, San Francisco (UCSF) to optimize workflow in the context of trial treatment scenarios and staggered rollout of the ePRO system to 26 sites to ensure effective implementation of the technology.</p><p><strong>Results: </strong>Increasing ePRO completion requires workflow solutions and research staff engagement. Over two years, we increased baseline survey completion from 25% to 80%. The majority of patients completed between 30% and 75% of the questionnaires they received, with no statistically significant variation in survey completion by age, race or ethnicity. Patients who completed the screening timepoint questionnaire were significantly more likely to complete more of the surveys they received at later timepoints (mean completion of 74.1% vs 35.5%, P < .0001). Baseline PROMIS social functioning and grade 2 or more PRO-CTCAE interference of Abdominal Pain, Decreased Appetite, Dizziness and Shortness of Breath was associated with lower survey completion rates.</p><p><strong>Discussion and conclusion: </strong>By implementing ePROs, we have the potential to increase efficiency and accuracy of patient-reported clinical trial data collection, while improving quality of care, patient safety, and health outcomes. Our method is accessible across demographics and facilitates an ease of data collection and sharing across nationwide sites. We identify predictors of decreased completion that can optimize resource allocation by better targeting efforts such as in-person outreach, staff engagement, a robust technical workflow, and increased monitoring to improve overall completion rates.</p><p><strong>Trial registration: </strong>https://clinicaltrials.gov/study/NCT01042379.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"172-180"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648710/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142001162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sukanya Mohapatra, Mirna Issa, Vedrana Ivezic, Rose Doherty, Stephanie Marks, Esther Lan, Shawn Chen, Keith Rozett, Lauren Cullen, Wren Reynolds, Rose Rocchio, Gregg C Fonarow, Michael K Ong, William F Speier, Corey W Arnold
{"title":"Increasing adherence and collecting symptom-specific biometric signals in remote monitoring of heart failure patients: a randomized controlled trial.","authors":"Sukanya Mohapatra, Mirna Issa, Vedrana Ivezic, Rose Doherty, Stephanie Marks, Esther Lan, Shawn Chen, Keith Rozett, Lauren Cullen, Wren Reynolds, Rose Rocchio, Gregg C Fonarow, Michael K Ong, William F Speier, Corey W Arnold","doi":"10.1093/jamia/ocae221","DOIUrl":"10.1093/jamia/ocae221","url":null,"abstract":"<p><strong>Objectives: </strong>Mobile health (mHealth) regimens can improve health through the continuous monitoring of biometric parameters paired with appropriate interventions. However, adherence to monitoring tends to decay over time. Our randomized controlled trial sought to determine: (1) if a mobile app with gamification and financial incentives significantly increases adherence to mHealth monitoring in a population of heart failure patients; and (2) if activity data correlate with disease-specific symptoms.</p><p><strong>Materials and methods: </strong>We recruited individuals with heart failure into a prospective 180-day monitoring study with 3 arms. All 3 arms included monitoring with a connected weight scale and an activity tracker. The second arm included an additional mobile app with gamification, and the third arm included the mobile app and a financial incentive awarded based on adherence to mobile monitoring.</p><p><strong>Results: </strong>We recruited 111 heart failure patients into the study. We found that the arm including the financial incentive led to significantly higher adherence to activity tracker (95% vs 72.2%, P = .01) and weight (87.5% vs 69.4%, P = .002) monitoring compared to the arm that included the monitoring devices alone. Furthermore, we found a significant correlation between daily steps and daily symptom severity.</p><p><strong>Discussion and conclusion: </strong>Our findings indicate that mobile apps with added engagement features can be useful tools for improving adherence over time and may thus increase the impact of mHealth-driven interventions. Additionally, activity tracker data can provide passive monitoring of disease burden that may be used to predict future events.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"181-192"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648719/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142037585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Markus Ralf Bujotzek, Ünal Akünal, Stefan Denner, Peter Neher, Maximilian Zenk, Eric Frodl, Astha Jaiswal, Moon Kim, Nicolai R Krekiehn, Manuel Nickel, Richard Ruppel, Marcus Both, Felix Döllinger, Marcel Opitz, Thorsten Persigehl, Jens Kleesiek, Tobias Penzkofer, Klaus Maier-Hein, Andreas Bucher, Rickmer Braren
{"title":"Real-world federated learning in radiology: hurdles to overcome and benefits to gain.","authors":"Markus Ralf Bujotzek, Ünal Akünal, Stefan Denner, Peter Neher, Maximilian Zenk, Eric Frodl, Astha Jaiswal, Moon Kim, Nicolai R Krekiehn, Manuel Nickel, Richard Ruppel, Marcus Both, Felix Döllinger, Marcel Opitz, Thorsten Persigehl, Jens Kleesiek, Tobias Penzkofer, Klaus Maier-Hein, Andreas Bucher, Rickmer Braren","doi":"10.1093/jamia/ocae259","DOIUrl":"10.1093/jamia/ocae259","url":null,"abstract":"<p><strong>Objective: </strong>Federated Learning (FL) enables collaborative model training while keeping data locally. Currently, most FL studies in radiology are conducted in simulated environments due to numerous hurdles impeding its translation into practice. The few existing real-world FL initiatives rarely communicate specific measures taken to overcome these hurdles. To bridge this significant knowledge gap, we propose a comprehensive guide for real-world FL in radiology. Minding efforts to implement real-world FL, there is a lack of comprehensive assessments comparing FL to less complex alternatives in challenging real-world settings, which we address through extensive benchmarking.</p><p><strong>Materials and methods: </strong>We developed our own FL infrastructure within the German Radiological Cooperative Network (RACOON) and demonstrated its functionality by training FL models on lung pathology segmentation tasks across six university hospitals. Insights gained while establishing our FL initiative and running the extensive benchmark experiments were compiled and categorized into the guide.</p><p><strong>Results: </strong>The proposed guide outlines essential steps, identified hurdles, and implemented solutions for establishing successful FL initiatives conducting real-world experiments. Our experimental results prove the practical relevance of our guide and show that FL outperforms less complex alternatives in all evaluation scenarios.</p><p><strong>Discussion and conclusion: </strong>Our findings justify the efforts required to translate FL into real-world applications by demonstrating advantageous performance over alternative approaches. Additionally, they emphasize the importance of strategic organization, robust management of distributed data and infrastructure in real-world settings. With the proposed guide, we are aiming to aid future FL researchers in circumventing pitfalls and accelerating translation of FL into radiological applications.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"193-205"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648732/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142512054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}