Eileen Koski, Amar Das, Pei-Yun Sabrina Hsueh, Anthony Solomonides, Amanda L Joseph, Gyana Srivastava, Carl Erwin Johnson, Joseph Kannry, Bilikis Oladimeji, Amy Price, Steven Labkoff, Gnana Bharathy, Baihan Lin, Douglas Fridsma, Lee A Fleisher, Monica Lopez-Gonzalez, Reva Singh, Mark G Weiner, Robert Stolper, Russell Baris, Suzanne Sincavage, Tristan Naumann, Tayler Williams, Tien Thi Thuy Bui, Yuri Quintana
{"title":"Towards responsible artificial intelligence in healthcare-getting real about real-world data and evidence.","authors":"Eileen Koski, Amar Das, Pei-Yun Sabrina Hsueh, Anthony Solomonides, Amanda L Joseph, Gyana Srivastava, Carl Erwin Johnson, Joseph Kannry, Bilikis Oladimeji, Amy Price, Steven Labkoff, Gnana Bharathy, Baihan Lin, Douglas Fridsma, Lee A Fleisher, Monica Lopez-Gonzalez, Reva Singh, Mark G Weiner, Robert Stolper, Russell Baris, Suzanne Sincavage, Tristan Naumann, Tayler Williams, Tien Thi Thuy Bui, Yuri Quintana","doi":"10.1093/jamia/ocaf133","DOIUrl":"https://doi.org/10.1093/jamia/ocaf133","url":null,"abstract":"<p><strong>Background: </strong>The use of real-world data (RWD) in artificial intelligence (AI) applications for healthcare offers unique opportunities but also poses complex challenges related to interpretability, transparency, safety, efficacy, bias, equity, privacy, ethics, accountability, and stakeholder engagement.</p><p><strong>Methods: </strong>A multi-stakeholder expert panel comprising healthcare professionals, AI developers, policymakers, and other stakeholders was assembled. Their task was to identify critical issues and formulate consensus recommendations, focusing on the responsible use of RWD in healthcare AI. The panel's work involved an in-person conference and workshop and extensive deliberations over several months.</p><p><strong>Results: </strong>The panel's findings revealed several critical challenges, including the necessity for data literacy and documentation, the identification and mitigation of bias, privacy and ethics considerations, and the absence of an accountability structure for stakeholder management. To address these, the panel proposed a series of recommendations, such as the adoption of metadata standards for RWD sources, the development of transparency frameworks and instructional labels likened to \"nutrition labels\" for AI applications, the provision of cross-disciplinary training materials, the implementation of bias detection and mitigation strategies, and the establishment of ongoing monitoring and update processes.</p><p><strong>Conclusion: </strong>Guidelines and resources focused on the responsible use of RWD in healthcare AI are essential for developing safe, effective, equitable, and trustworthy applications. The proposed recommendations provide a foundation for a comprehensive framework addressing the entire lifecycle of healthcare AI, emphasizing the importance of documentation, training, transparency, accountability, and multi-stakeholder engagement.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145151534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yudong Wang, Dazheng Zhang, Jiayi Tong, Xing He, Liang Li, Lichao Sun, Ashutosh M Shukla, Jiang Bian, David A Asch, Yong Chen
{"title":"A communication-efficient federated learning algorithm to assess racial disparities in post-transplantation survival time.","authors":"Yudong Wang, Dazheng Zhang, Jiayi Tong, Xing He, Liang Li, Lichao Sun, Ashutosh M Shukla, Jiang Bian, David A Asch, Yong Chen","doi":"10.1093/jamia/ocaf138","DOIUrl":"https://doi.org/10.1093/jamia/ocaf138","url":null,"abstract":"<p><strong>Objective: </strong>Patients of different race have different outcomes following renal transplantation. Patients of different race also undergo renal transplantation at different hospitals. We used a novel decentralized multisite approach to quantitatively assess the effect of site of care on racial disparities between non-Hispanic Black (NHB) and non-Hispanic White (NHW) patients in post-transplantation survival times.</p><p><strong>Materials and methods: </strong>In this study, we develop a communication-efficient federated learning algorithm to assess site-of-care associated racial disparities based on decentralized time-to-event data, called Communication-Efficient Distributed Analysis for Racial Disparity in Time-to-event Data (CEDAR-t2e). The algorithm includes 2 modules. Module I is to estimate the site-specific proportional hazards model for time-to-event outcomes in a distributed manner, in which the Poissonization is used to simplify the estimation procedure. Based on the estimated results from Module I, Module II calculates how long the kidney failure time of NHB patients would be extended had they been admitted to transplant centers in the same distribution as NHW patients were admitted.</p><p><strong>Results: </strong>With application to United States Renal Data System data covering 39 043 patients across 73 transplant centers, we found no evidence suggesting the presence of site-of-care associated racial disparities in post-transplantation survival times. In particular, restricting to one year after transplantation, the counterfactual graft failure time would have been extended by only 0.61 days on average if NHB had the same admission distribution to transplant centers as NHW patients.</p><p><strong>Discussion: </strong>The proposed approach offers a quantitative measure to evaluate site-of-care associated racial disparities.</p><p><strong>Conclusion: </strong>Our approach has the potential to be extended to investigate site-of-care related disparities in other time-to-event outcomes, thus promoting health equity and improving patient health in various fields.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Development and application of desiderata for automated clinical ordering.","authors":"Sameh N Saleh, Kevin B Johnson","doi":"10.1093/jamia/ocaf152","DOIUrl":"https://doi.org/10.1093/jamia/ocaf152","url":null,"abstract":"<p><strong>Introduction: </strong>Automation of clinical orders in electronic health records (EHRs) has the potential to reduce clinician burden and enhance patient safety. However, determining which orders are appropriate for automation requires a structured framework to ensure clinical validity, transparency, and safety.</p><p><strong>Objective: </strong>To develop and validate a framework of desiderata for assessing the appropriateness of automating clinical orders in EHRs and to demonstrate its operational value in a live health system dataset.</p><p><strong>Materials and methods: </strong>The study comprised 4 phases to move from concept generation to real-world demonstration. First, we conducted focus group analyses using ground theory to identify themes and developed desiderata informed by these themes and existing literature. We validated the desiderata by surveying clinicians at a single institution, presenting 10 use cases to and assessing perceived appropriateness, cognitive support, and patient safety using a 4-point Likert scale. Survey results were compared to a priori appropriateness designations using t-tests. To evaluate operational impact, we analyzed one year of order-based alerts and orders (1.4 million firings alert and 44.1 million orders, respectively) using filtering rules and association rule mining to identify candidate orders for automation and their impact.</p><p><strong>Results: </strong>We identified 8 desiderata for automated order appropriateness: logical consistency, data provenance, order transparency, context permanence, monitoring plans, trigger consistency, care team empowerment, and system accountability. Use cases deemed appropriate based on these criteria received significantly higher scores for appropriateness (3.13 ± 0.84 vs 2.30 ± 0.99), cognitive support (3.08 ± 0.82 vs 2.25 ± 0.94), and patient safety (3.08 ± 0.86 vs 2.21 ± 0.98) (all P < .001) compared to those considered inappropriate. Operational analysis revealed an alert firing 19 109 times annually, with a 96% signed order rate, where automation could save an estimated 26.5 provider hours per year. Additionally, an association rule with 16 628 occurrences (68.4% confidence) suggested automation could save 15.8 hours annually and yield 8000 additional appropriate orders.</p><p><strong>Discussion: </strong>The desiderata align with clinician perceptions and provide a structured approach for evaluating automated orders. Our findings highlight the potential for automation of certain clinical orders to improve cognitive support while maintaining patient safety.</p><p><strong>Conclusion: </strong>Healthcare systems should use these desiderata, coupled with data mining techniques, to systematically identify and govern appropriate automated orders. Further research is needed to validate operational scalability.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145126405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dimitrios Bounias, Lina Simons, Michael Baumgartner, Chris Ehring, Peter Neher, Lorenz A Kapsner, Balint Kovacs, Ralf Floca, Paul F Jaeger, Jessica Eberle, Dominique Hadler, Frederik B Laun, Sabine Ohlmeyer, Lena Maier-Hein, Michael Uder, Evelyn Wenkel, Klaus H Maier-Hein, Sebastian Bickelhaupt
{"title":"Including AI in diffusion-weighted breast MRI has potential to increase reader confidence and reduce workload.","authors":"Dimitrios Bounias, Lina Simons, Michael Baumgartner, Chris Ehring, Peter Neher, Lorenz A Kapsner, Balint Kovacs, Ralf Floca, Paul F Jaeger, Jessica Eberle, Dominique Hadler, Frederik B Laun, Sabine Ohlmeyer, Lena Maier-Hein, Michael Uder, Evelyn Wenkel, Klaus H Maier-Hein, Sebastian Bickelhaupt","doi":"10.1093/jamia/ocaf156","DOIUrl":"https://doi.org/10.1093/jamia/ocaf156","url":null,"abstract":"<p><strong>Objectives: </strong>Breast diffusion-weighted imaging (DWI) has shown potential as a standalone imaging technique for certain indications, eg, supplemental screening of women with dense breasts. This study evaluates an artificial intelligence (AI)-powered computer-aided diagnosis (CAD) system for clinical interpretation and workload reduction in breast DWI.</p><p><strong>Materials and methods: </strong>This retrospective IRB-approved study included: n = 824 examinations for model development (2017-2020) and n = 235 for evaluation (01/2021-06/2021). Readings were performed by three readers using either the AI-CAD or manual readings. BI-RADS-like (Breast Imaging Reporting and Data System) classification was based on DWI. Histopathology served as ground truth. The model was nnDetection-based, trained using 5-fold cross-validation and ensembling. Statistical significance was determined using McNemar's test. Inter-rater agreement was calculated using Cohen's kappa. Model performance was calculated using the area under the receiver operating curve (AUC).</p><p><strong>Results: </strong>The AI-augmented approach significantly reduced BI-RADS-like 3 calls in breast DWI by 29% (P =.019) and increased interrater agreement (0.57 ± 0.10 vs 0.49 ± 0.11), while preserving diagnostic accuracy. Two of the three readers detected more malignant lesions (63/69 vs 59/69 and 64/69 vs 62/69) with the AI-CAD. The AI model achieved an AUC of 0.78 (95% CI: [0.72, 0.85]; P <.001), which increased for women at screening age to 0.82 (95% CI: [0.73, 0.90]; P <.001), indicating a potential for workload reduction of 20.9% at 96% sensitivity.</p><p><strong>Discussion and conclusion: </strong>Breast DWI might benefit from AI support. In our study, AI showed potential for reduction of BI-RADS-like 3 calls and increase of inter-rater agreement. However, given the limited study size, further research is needed.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145126441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shelly Soffer, Mahmud Omar, Moran Gendler, Benjamin S Glicksberg, Patricia Kovatch, Orly Efros, Robert Freeman, Alexander W Charney, Girish N Nadkarni, Eyal Klang
{"title":"A scalable framework for benchmark embedding models in semantic health-care tasks.","authors":"Shelly Soffer, Mahmud Omar, Moran Gendler, Benjamin S Glicksberg, Patricia Kovatch, Orly Efros, Robert Freeman, Alexander W Charney, Girish N Nadkarni, Eyal Klang","doi":"10.1093/jamia/ocaf149","DOIUrl":"https://doi.org/10.1093/jamia/ocaf149","url":null,"abstract":"<p><strong>Objectives: </strong>Text embeddings are promising for semantic tasks, such as retrieval augmented generation (RAG). However, their application in health care is underexplored due to a lack of benchmarking methods. We introduce a scalable benchmarking method to test embeddings for health-care semantic tasks.</p><p><strong>Materials and methods: </strong>We evaluated 39 embedding models across 7 medical semantic similarity tasks using diverse datasets. These datasets comprised real-world patient data (from the Mount Sinai Health System and MIMIC IV), biomedical texts from PubMed, and synthetic data generated with Llama-3-70b. We first assessed semantic textual similarity (STS) by correlating the model-generated similarity scores with noise levels using Spearman rank correlation. We then reframed the same tasks as retrieval problems, evaluated by mean reciprocal rank and recall at k.</p><p><strong>Results: </strong>In total, evaluating 2000 text pairs per 7 tasks for STS and retrieval yielded 3.28 million model assessments. Larger models (>7b parameters), such as those based on Mistral-7b and Gemma-2-9b, consistently performed well, especially in long-context tasks. The NV-Embed-v1 model (7b parameters), although top in short tasks, underperformed in long tasks. For short tasks, smaller models such as b1ade-embed (335M parameters) performed on-par to the larger models. For long retrieval tasks, the larger models significantly outperformed the smaller ones.</p><p><strong>Discussion: </strong>The proposed benchmarking framework demonstrates scalability and flexibility, offering a structured approach to guide the selection of embedding models for a wide range of health-care tasks.</p><p><strong>Conclusion: </strong>By matching the appropriate model with the task, the framework enables more effective deployment of embedding models, enhancing critical applications such as semantic search and retrieval-augmented generation (RAG).</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fateme Nateghi Haredasht, Ivan Lopez, Steven Tate, Pooya Ashtari, Min Min Chan, Deepali Kulkarni, Chwen-Yuen Angie Chen, Maithri Vangala, Kira Griffith, Bryan Bunning, Adam S Miner, Tina Hernandez-Boussard, Keith Humphreys, Anna Lembke, L Alexander Vance, Jonathan H Chen
{"title":"Predicting treatment retention in medication for opioid use disorder: a machine learning approach using NLP and LLM-derived clinical features.","authors":"Fateme Nateghi Haredasht, Ivan Lopez, Steven Tate, Pooya Ashtari, Min Min Chan, Deepali Kulkarni, Chwen-Yuen Angie Chen, Maithri Vangala, Kira Griffith, Bryan Bunning, Adam S Miner, Tina Hernandez-Boussard, Keith Humphreys, Anna Lembke, L Alexander Vance, Jonathan H Chen","doi":"10.1093/jamia/ocaf157","DOIUrl":"https://doi.org/10.1093/jamia/ocaf157","url":null,"abstract":"<p><strong>Objective: </strong>Building upon our previous work on predicting treatment retention in medications for opioid use disorder, we aimed to improve 6-month retention prediction in buprenorphine-naloxone (BUP-NAL) therapy by incorporating features derived from large language models (LLMs) applied to unstructured clinical notes.</p><p><strong>Materials and methods: </strong>We used de-identified electronic health record (EHR) data from Stanford Health Care (STARR) for model development and internal validation, and the NeuroBlu behavioral health database for external validation. Structured features were supplemented with 13 clinical and psychosocial features extracted from free-text notes using the CLinical Entity Augmented Retrieval pipeline, which combines named entity recognition with LLM-based classification to provide contextual interpretation. We trained classification (Logistic Regression, Random Forest, XGBoost) and survival models (CoxPH, Random Survival Forest, Survival XGBoost), evaluated using Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) and C-index.</p><p><strong>Results: </strong>XGBoost achieved the highest classification performance (ROC-AUC = 0.65). Incorporating LLM-derived features improved model performance across all architectures, with the largest gains observed in simpler models such as Logistic Regression. In time-to-event analysis, Random Survival Forest and Survival XGBoost reached the highest C-index (≈0.65). SHapley Additive exPlanations analysis identified LLM-extracted features like Chronic Pain, Liver Disease, and Major Depression as key predictors. We also developed an interactive web tool for real-time clinical use.</p><p><strong>Discussion: </strong>Features extracted using NLP and LLM-assisted methods improved model accuracy and interpretability, revealing valuable psychosocial risks not captured in structured EHRs.</p><p><strong>Conclusion: </strong>Combining structured EHR data with LLM-extracted features moderately improves BUP-NAL retention prediction, enabling personalized risk stratification and advancing AI-driven care for substance use disorders.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145114959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vijeeth Guggilla, Mengjia Kang, Melissa J Bak, Steven D Tran, Anna Pawlowski, Prasanth Nannapaneni, Luke V Rasmussen, Daniel Schneider, Helen K Donnelly, Ankit Agrawal, David Liebovitz, Alexander V Misharin, G R Scott Budinger, Richard G Wunderink, Theresa L Walunas, Catherine A Gao
{"title":"Large language models accurately identify immunosuppression in intensive care unit patients.","authors":"Vijeeth Guggilla, Mengjia Kang, Melissa J Bak, Steven D Tran, Anna Pawlowski, Prasanth Nannapaneni, Luke V Rasmussen, Daniel Schneider, Helen K Donnelly, Ankit Agrawal, David Liebovitz, Alexander V Misharin, G R Scott Budinger, Richard G Wunderink, Theresa L Walunas, Catherine A Gao","doi":"10.1093/jamia/ocaf141","DOIUrl":"10.1093/jamia/ocaf141","url":null,"abstract":"<p><strong>Objective: </strong>Rule-based structured data algorithms and natural language processing (NLP) approaches applied to unstructured clinical notes have limited accuracy and poor generalizability for identifying immunosuppression. Large language models (LLMs) may effectively identify patients with heterogenous types of immunosuppression from unstructured clinical notes. We compared the performance of LLMs applied to unstructured notes for identifying patients with immunosuppressive conditions or immunosuppressive medication use against 2 baselines: (1) structured data algorithms using diagnosis codes and medication orders and (2) NLP approaches applied to unstructured notes.</p><p><strong>Materials and methods: </strong>We used hospital admission notes from a primary cohort of 827 intensive care unit (ICU) patients at Northwestern Memorial Hospital and a validation cohort of 200 ICU patients at Beth Israel Deaconess Medical Center, along with diagnosis codes and medication orders from the primary cohort. We evaluated the performance of structured data algorithms, NLP approaches, and LLMs in identifying 7 immunosuppressive conditions and 6 immunosuppressive medications.</p><p><strong>Results: </strong>In the primary cohort, structured data algorithms achieved peak F1 scores ranging from 0.30 to 0.97 for identifying immunosuppressive conditions and medications. NLP approaches achieved peak F1 scores ranging from 0 to 1. GPT-4o outperformed or matched structured data algorithms and NLP approaches across all conditions and medications, with F1 scores ranging from 0.51 to 1. GPT-4o also performed impressively in our validation cohort (F1 = 1 for 8/13 variables).</p><p><strong>Discussion: </strong>LLMs, particularly GPT-4o, outperformed structured data algorithms and NLP approaches in identifying immunosuppressive conditions and medications with robust external validation.</p><p><strong>Conclusion: </strong>LLMs can be applied for improved cohort identification for research purposes.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12490808/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145114981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brian H Park, Chun-Nan Hsu, Austin Nguyen, Ying Q Zhou, Rodney A Gabriel
{"title":"Improving postoperative length of stay forecasting with retrieval-augmented prediction.","authors":"Brian H Park, Chun-Nan Hsu, Austin Nguyen, Ying Q Zhou, Rodney A Gabriel","doi":"10.1093/jamia/ocaf154","DOIUrl":"https://doi.org/10.1093/jamia/ocaf154","url":null,"abstract":"<p><strong>Objective: </strong>The objective of this study is to evaluate retrieval-augmented prediction for forecasting hospital length of stay (LOS) following surgery compared to traditional machine learning (ML), standalone large language models (LLMs), and retrieval-augmented generation (RAG) approaches.</p><p><strong>Materials and methods: </strong>Spine surgery cases were extracted from electronic health records. Structured features and operative notes were concatenated into natural language patient representations, embedded using Sentence-Bidirectional Encoder Representations from Transformer, and stored in a vector database. Eight predictive models were implemented, including a baseline model, standalone ML with embeddings, standalone LLM (Gemma 3:27B), and combinations of these with retrieval-augmented prediction or generation. The retrieval-augmented prediction model computed a similarity-weighted average LOS from nearest neighbors. Performance was assessed using R2, mean absolute value (MAE), and root mean square error (RMSE).</p><p><strong>Results: </strong>Retrieval-augmented prediction alone outperformed standalone ML and LLM models (R2 = 0.39, MAE = 4.47). Combining ML or LLM outputs with retrieval-augmented prediction further improved performance. The best performing model was a neural network blended with retrieval-augmented prediction (R2 = 0.52, MAE = 4.16). LLM-RAG alone reached R2 = 0.19, which improved to 0.47 when combined with retrieval-augmented predictions. Retrieval-augmented prediction consistently reduced MAE and RMSE by up to 32% and 38%, respectively.</p><p><strong>Discussion: </strong>Retrieval-augmented prediction offers interpretable and resource-efficient forecasting by semantically leveraging prior patient cases without generative modeling. It consistently outperformed RAG and ML across metrics, approximating clinical reasoning via similarity-based inference.</p><p><strong>Conclusion: </strong>Retrieval-augmented prediction significantly enhances LOS prediction accuracy over standard ML and LLM models. Its interpretability and scalability make it a promising solution for integrating predictive analytics into clinical workflows.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145092790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hillclimb-Causal Inference: a data-driven approach to identify causal pathways among parental behaviors, genetic risk, and externalizing behaviors in children.","authors":"Mengman Wei, Qian Peng","doi":"10.1093/jamia/ocaf153","DOIUrl":"https://doi.org/10.1093/jamia/ocaf153","url":null,"abstract":"<p><strong>Objectives: </strong>Externalizing behaviors in children, such as aggression, hyperactivity, and defiance, are influenced by complex interplays between genetic predispositions and environmental factors, particularly parental behaviors. Unraveling these intricate causal relationships can benefit from the use of robust data-driven methods.</p><p><strong>Materials and methods: </strong>We developed \"Hillclimb-Causal Inference,\" a causal discovery approach that integrates the Hill Climb Search algorithm with a customized Linear Gaussian Bayesian Information Criterion (BIC). This method was applied to data from the Adolescent Brain Cognitive Development (ABCD) Study, which included parental behavior assessments, children's genotypes, and externalizing behavior measures. We performed dimensionality reduction to address multicollinearity among parental behaviors and assessed children's genetic risk for externalizing disorders using polygenic risk scores (PRS), which were computed based on GWAS summary statistics from independent cohorts. Once the causal pathways were identified, we employed structural equation modeling (SEM) to quantify the relationships within the model.</p><p><strong>Results: </strong>We identified prominent causal pathways linking parental behaviors to children's externalizing outcomes. Parental alcohol misuse and broader behavioral issues exhibited notably stronger direct effects (0.33 and 0.20, respectively) compared to children's PRS (0.07). Moreover, when considering both direct and indirect paths, parental substance misuse (alcohol, drugs, and tobacco) collectively resulted in a total effect exceeding 1.1 on externalizing behaviors. Bootstrap and sensitivity analyses further validated the robustness of these findings.</p><p><strong>Discussion and conclusion: </strong>Parental behaviors exert larger effects on children's externalizing outcomes than genetic risk, suggesting potential targets for prevention and intervention. The Hillclimb-Causal framework provides a general, data-driven way to map causal pathways in developmental psychiatry and related domains.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145092776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lina Tieu, Courtney R Lyles, Hyunjin Cindy Kim, Isabel Luna, Jeanette Wong, Naomi Lopez-Solano, Junhong Li, Andersen Yang, Jorge A Rodriguez, Oanh Kieu Nguyen, Alejandra Casillas, Emilia H De Marchis, Anita L Stewart, Torsten B Neilands, Elaine C Khoong
{"title":"A self-report measure of digital skills needed to use digital health tools among older adults-the Skills Measurement and Readiness Training for Digital Health (SMART Digital Health) Scale.","authors":"Lina Tieu, Courtney R Lyles, Hyunjin Cindy Kim, Isabel Luna, Jeanette Wong, Naomi Lopez-Solano, Junhong Li, Andersen Yang, Jorge A Rodriguez, Oanh Kieu Nguyen, Alejandra Casillas, Emilia H De Marchis, Anita L Stewart, Torsten B Neilands, Elaine C Khoong","doi":"10.1093/jamia/ocaf151","DOIUrl":"https://doi.org/10.1093/jamia/ocaf151","url":null,"abstract":"<p><strong>Objective: </strong>To identify a brief scale to accurately assess digital skills among older adults for use in identifying need for support to use digital health tools.</p><p><strong>Materials and methods: </strong>Patients age ≥50 speaking English, Spanish, or Cantonese completed surveys (n = 186) assessing digital health access, use, and skills. A subsample (n = 101) completed observational task assessments gauging competency on 4 tasks essential to digital health skills: (1) launch a video visit from an email/text message hyperlink, (2) visit a specific health website, (3) sign up for a patient portal, and (4) log in to a patient portal. We used exploratory factor analysis, receiver operator characteristic, logistic regression, and dominance analysis methods to identify and evaluate a scale measuring digital skills essential to using digital health tools.</p><p><strong>Results: </strong>We found that a 9-item scale demonstrated unidimensionality and reliability (Cronbach's alpha 0.93) in measuring digital skills. Mean score was 19.3 out of 36. For each task, handout/video support was inadequate in facilitating completion for one-quarter of participants. We found high accuracy of the scale in predicting digital health competency (area under the curve 0.77-0.88).</p><p><strong>Discussion: </strong>The Skills Measurement and Readiness Training for Digital Health (SMART Digital Health) scale is a measure of digital skills with evidence of reliability and validity to be used as a diagnostic tool to identify patients requiring support to use digital health tools.</p><p><strong>Conclusion: </strong>This early work supports the identification of patients with digital literacy needs who may require interventions to effectively engage in digital health communication and management.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145092691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}