{"title":"A Large-Language Model Framework for Relative Timeline Extraction from PubMed Case Reports.","authors":"Jing Wang, Jeremy C Weiss","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Timing of clinical events is central to characterization of patient trajectories, enabling analyses such as process tracing, forecasting, and causal reasoning. However, structured electronic health records capture few data elements critical to these tasks, while clinical reports lack temporal localization of events in structured form. We present a system that transforms case reports into textual time series-structured pairs of textual events and timestamps. We contrast manual and large language model (LLM) annotations (n=320 and n=390 respectively) of ten randomly-sampled PubMed open-access (PMOA) case reports (N=152,974) and assess inter-LLM agreement (n=3,103 N=93). We find that the LLM models have moderate event recall (O1-preview: 0.80) but high temporal concordance among identified events (O1-preview: 0.95). By establishing the task, annotation, and assessment systems, and by demonstrating high concordance, this work may serve as a benchmark for leveraging the PMOA corpus for temporal analytics. Code is available at:https://github.com/jcweiss2/LLM-Timeline-PMOA/.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2025 ","pages":"598-606"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150726/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Generalized Tool to Assess Algorithmic Fairness in Disease Phenotype Definitions.","authors":"Jacob S Zelko, Justin Manjourides","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>For evidence from observational studies to be reliable, researchers must ensure that the patient populations of interest are accurately defined. However, disease definitions can be extremely difficult to standardize and implement accurately across different datasets and study requirements. Furthermore, in this context, they must also ensure that populations are represented fairly to accurately reflect populations' various demographic dynamics and to not overgeneralize across non-applicable populations. In this work, we present a generalized tool to assess the fairness of disease definitions by evaluating their implementation across common fairness metrics. Our approach calculates fairness metrics and provides a robust method to examine coarse and strongly intersecting populations across many characteristics. We highlight workflows when working with disease definitions, provide an example analysis using an OMOP CDM patient database, and discuss potential directions for future improvement and research.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2025 ","pages":"624-633"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150753/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi-Fei Zhao, Allyn Bove, David Thompson, James Hill, Yi Xu, Yufan Ren, Andrea Hassman, Leming Zhou, Yanshan Wang
{"title":"Generative AI Is Not Ready for Clinical Use in Patient Education for Lower Back Pain Patients, Even With Retrieval-Augmented Generation.","authors":"Yi-Fei Zhao, Allyn Bove, David Thompson, James Hill, Yi Xu, Yufan Ren, Andrea Hassman, Leming Zhou, Yanshan Wang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Low back pain (LBP) is a leading cause of disability globally. Following the onset of LBP and subsequent treatment, adequate patient education is crucial for improving functionality and long-term outcomes. Despite advancements in patient education strategies, significant gaps persist in delivering personalized, evidence-based information to patients with LBP. Recent advancements in large language models (LLMs) and generative artificial intelligence (GenAI) have demonstrated the potential to enhance patient education. However, their application and efficacy in delivering educational content to patients with LBP remain underexplored and warrant further investigation. In this study, we introduce a novel approach utilizing LLMs with Retrieval-Augmented Generation (RAG) and few-shot learning to generate tailored educational materials for patients with LBP. Physical therapists manually evaluated our model responses for redundancy, accuracy, and completeness using a Likert scale. In addition, the readability of the generated education materials is assessed using the Flesch Reading Ease score. The findings demonstrate that RAG-based LLMs outperform traditional LLMs, providing more accurate, complete, and readable patient education materials with less redundancy. Having said that, our analysis reveals that the generated materials are not yet ready for use in clinical practice. This study underscores the potential of AI-driven models utilizing RAG to improve patient education for LBP; however, significant challenges remain in ensuring the clinical relevance and granularity of content generated by these models.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2025 ","pages":"644-653"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150711/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrating Social Determinants of Health into Knowledge Graphs: Evaluating Prediction Bias and Fairness in Healthcare.","authors":"Tianqi Shang, Weiqing He, Tianlong Chen, Ying Ding, Huanmei Wu, Kaixiong Zhou, Li Shen","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Social determinants of health (SDoH) play a crucial role in patient health outcomes, yet their integration into biomedical knowledge graphs remains underexplored. This study addresses this gap by constructing an SDoH-enriched knowledge graph using the MIMIC-III dataset and PrimeKG. We introduce a novel fairness formulation for graph embeddings, focusing on invariance with respect to sensitive SDoH information. Via employing a heterogeneous-GCN model for drug-disease link prediction, we detect biases related to various SDoH factors. To mitigate these biases, we propose a post-processing method that strategically reweights edges connected to SDoHs, balancing their influence on graph representations. This approach represents one of the first comprehensive investigations into fairness issues within biomedical knowledge graphs incorporating SDoH. Our work not only highlights the importance of considering SDoH in medical informatics but also provides a concrete method for reducing SDoH-related biases in link prediction tasks, paving the way for more equitable healthcare recommendations. Our code is available at https://github.com/hwq0726/SDoH-KG.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2025 ","pages":"481-490"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150739/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jin Peng, Jiayuan Chen, Changchang Yin, Ping Zhang, Jingzhen Yang
{"title":"Comparison of Machine Learning Models in Predicting Mental Health Sequelae Following Concussion in Youth.","authors":"Jin Peng, Jiayuan Chen, Changchang Yin, Ping Zhang, Jingzhen Yang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Youth who experience concussions may be at greater risk for subsequent mental health challenges, making early detection crucial for timely intervention. This study utilized Bidirectional Long Short-Term Memory (BiLSTM) networks to predict mental health outcomes following concussion in youth and compared its performance to traditional models. We also examined whether incorporating social determinants of health (SDoH) improved predictive power, given the disproportionate impact of concussions and mental health issues on disadvantaged populations. We evaluated the models using accuracy, area under the curve (4UC) of the receiver operating characteristic (ROC), and other performance metrics. Our BiLSTM model with SDoH data achieved the highest accuracy (0.883) and 4UC-ROC score (0.892). Unlike traditional models, our approach provided real-time predictions at each visit within 12 months of the index concussion, aiding clinicians in making timely, visit-specific referrals for further treatment and interventions.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2025 ","pages":"422-431"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150734/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ioannis Bilionis, Ricardo C Berrios, Luis Fernandez-Luque, Carlos Castillo
{"title":"Disparate Model Performance and Stability in Machine Learning Clinical Support for Diabetes and Heart Diseases.","authors":"Ioannis Bilionis, Ricardo C Berrios, Luis Fernandez-Luque, Carlos Castillo","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Machine Learning (ML) algorithms are vital for supporting clinical decision-making in biomedical informatics. However, their predictive performance can vary across demographic groups, often due to the underrepresentation of historically marginalized populations in training datasets. The investigation reveals widespread sex- and age-related inequities in chronic disease datasets and their derived ML models. Thus, a novel analytical framework is introduced, combining systematic arbitrariness with traditional metrics like accuracy and data complexity. The analysis of data from over 25,000 individuals with chronic diseases revealed mild sex-related disparities, favoring predictive accuracy for males, and significant age-related differences, with better accuracy for younger patients. Notably, older patients showed inconsistent predictive accuracy across seven datasets, linked to higher data complexity and lower model performance. This highlights that representativeness in training data alone does not guarantee equitable outcomes, and model arbitrariness must be addressed before deploying models in clinical settings.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2025 ","pages":"95-104"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150696/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EntroLLM: Leveraging Entropy and Large Language Model Embeddings for Enhanced Risk Prediction with Wearable Device Data.","authors":"Xueqing Huang, Tian Gu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Wearable devices collect complex structured data with high-dimensional and time-series features that are challenging for traditional models to handle efficiently. We propose EntroLLM, a new method that combines entropy measures and the low-dimensional representation (embedding) generated from large language models (LLMs) to enhance risk prediction using wearable device data. In EntroLLM, the entropy quantifies the variability of a subject's physical activity patterns, while the LLM embedding approximates the latent temporal structure. We evaluate the feasibility and performance of EntroLLM using NHANES data to predict overweight status using demographics and physical activity collected from wearable devices. Results show that combining entropy with GPT-based embedding improves model performance compared to baseline models and other embedding techniques, leading to an average increase in AUC from 0.56 to 0.64. EntroLLM showcases the potential of combining entropy and LLM-based embedding and offers a promising approach to wearable device data analysis for predicting health outcomes.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2025 ","pages":"225-234"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150754/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting survival time for critically ill patients with heart failure using conformalized survival analysis.","authors":"Xiaomeng Wang, Zhimei Ren, Jiancheng Ye","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Heart failure (HF) is a significant public health challenge, especially among critically ill patients in intensive care units (ICUs). Predicting survival outcomes for these patients with calibrated uncertainty is both challenging and essential for guiding subsequent treatments. This study introduces conformalized survival analysis (CSA) as a novel method for predicting survival times in critically ill HF patients. CSA enhances each predicted survival time with a statistically rigorous lower bound, providing valuable uncertainty quantification. Using the MIMIC-IV dataset, we demonstrate that CSA effectively delivers calibrated uncertainty quantification for survival predictions, in contrast to parametric models like the Cox or Accelerated Failure Time models. Through the application of CSA to a large, real-world dataset, this study underscores its potential to improve decision-making in critical care, offering a more precise and reliable tool for prognosis in a setting where accurate predictions and calibrated uncertainty can profoundly impact patient outcomes.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2025 ","pages":"576-597"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150701/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comparative Analysis of Patient Similarity Measures for Outcome Prediction.","authors":"Deyi Li, Alan S L Yu, Mei Liu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Personalized medicine aims to improve clinical outcomes by tailoring treatments to individual patients based on genetic, phenotypic, or psychosocial characteristics, leveraging insights from similar patients. This is particularly necessary for managing diseases with significant variability in their causes, progressions and prognoses. Accurate measurement of patient similarity is crucial in this context, as it enables the identification of a high-quality cohort of similar patients, thereby enhancing clinical decision making with better evidence. However, previous studies have not comprehensively compared different patient similarity measures in large-scale retrospective analyses of electronic health records (EHRs). In this study, we conducted a comparative analysis of four patient similarity measures focusing on feature weighting mechanisms, using EHR data from 46,968 hospitalized patients. For evaluation, we assessed the patient similarity measures for predicting acute kidney injury, readmission, and mortality. Our results showed that using grid-searched weights to combine features based by their types outperformed all other methods.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2025 ","pages":"270-279"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150746/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nattanit Songthangtham, Ratchada Jantraporn, Elizabeth Weinfurter, Gyorgy Simon, Wei Pan, Sripriya Rajamani, Steven G Johnson
{"title":"A Standardized Guideline for Assessing Extracted Electronic Health Records Cohorts: A Scoping Review.","authors":"Nattanit Songthangtham, Ratchada Jantraporn, Elizabeth Weinfurter, Gyorgy Simon, Wei Pan, Sripriya Rajamani, Steven G Johnson","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Assessing how accurately a cohort extracted from Electronic Health Records (EHR) represents the intended target population, or cohort fitness, is critical but often overlooked in secondary EHR data use. This scoping review aimed to (1) identify guidelines for assessing cohort fitness and (2) determine their thoroughness by examining whether they offer sufficient detail and computable methods for researchers. This scoping review follows the JBI guidance for scoping reviews and is refined based on the Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for scoping reviews (PRISMA-ScR) checklists. Searches were performed in Medline, Embase, and Scopus. From 1,904 results, 30 articles and 2 additional references were reviewed. Nine articles (28.13%) include a framework for evaluating cohort fitness but only 5 (15.63%) contain sufficient details and quantitative methodologies. Overall, a more comprehensive guideline that provides best practices for measuring the cohort fitness is still needed.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2025 ","pages":"527-536"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150730/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}