Ethan Ethan, Robert Gallo, Eric Strong, Yingjie Weng, Hannah Kerman, Jason Freed, Josephine A Cool, Zahir Kanjee, Kathleen Lane, Andrew S Parsons, Neera Ahuja, Eric Horvitz, Daniel Yang, Arnold Milstein, Andrew PJ Olson, Jason Hom, Jonathan H. Chen, Adam Rodman
{"title":"Large Language Model Influence on Management Reasoning: A Randomized Controlled Trial","authors":"Ethan Ethan, Robert Gallo, Eric Strong, Yingjie Weng, Hannah Kerman, Jason Freed, Josephine A Cool, Zahir Kanjee, Kathleen Lane, Andrew S Parsons, Neera Ahuja, Eric Horvitz, Daniel Yang, Arnold Milstein, Andrew PJ Olson, Jason Hom, Jonathan H. Chen, Adam Rodman","doi":"10.1101/2024.08.05.24311485","DOIUrl":"https://doi.org/10.1101/2024.08.05.24311485","url":null,"abstract":"Importance: Large language model (LLM) artificial intelligence (AI) systems have shown promise in diagnostic reasoning, but their utility in management reasoning with no clear right answers is unknown.\u0000Objective: To determine whether LLM assistance improves physician performance on open-ended management reasoning tasks compared to conventional resources.\u0000Design: Prospective, randomized controlled trial conducted from 30 November 2023 to 21 April 2024.\u0000Setting: Multi-institutional study from Stanford University, Beth Israel Deaconess Medical Center, and the University of Virginia involving physicians from across the United States.\u0000Participants: 92 practicing attending physicians and residents with training in internal medicine, family medicine, or emergency medicine. Intervention: Five expert-developed clinical case vignettes were presented with multiple open-ended management questions and scoring rubrics created through a Delphi process. Physicians were randomized to use either GPT-4 via ChatGPT Plus in addition to conventional resources (e.g., UpToDate, Google), or conventional resources alone.\u0000Main Outcomes and Measures: The primary outcome was difference in total score between groups on expert-developed scoring rubrics. Secondary outcomes included domain-specific scores and time spent per case.\u0000Results: Physicians using the LLM scored higher compared to those using conventional resources (mean difference 6.5 %, 95% CI 2.7-10.2, p<0.001). Significant improvements were seen in management decisions (6.1%, 95% CI 2.5-9.7, p=0.001), diagnostic decisions (12.1%, 95% CI 3.1-21.0, p=0.009), and case-specific (6.2%, 95% CI 2.4-9.9, p=0.002) domains. GPT-4 users spent more time per case (mean difference 119.3 seconds, 95% CI 17.4-221.2, p=0.02). There was no significant difference between GPT-4-augmented physicians and GPT-4 alone (-0.9%, 95% CI -9.0 to 7.2, p=0.8).\u0000Conclusions and Relevance: LLM assistance improved physician management reasoning compared to conventional resources, with particular gains in contextual and patient-specific decision-making. These findings indicate that LLMs can augment management decision-making in complex cases. Trial Registration ClinicalTrials.gov Identifier: NCT06208423; https://classic.clinicaltrials.gov/ct2/show/NCT06208423","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"369 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141931949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine learning for comprehensive interaction modelling improves disease risk prediction in the UK Biobank","authors":"Heli Julkunen, Juho Rousu","doi":"10.1101/2024.08.07.24311604","DOIUrl":"https://doi.org/10.1101/2024.08.07.24311604","url":null,"abstract":"Understanding how risk factors interact to jointly influence disease risk can provide insights into disease development and improve risk prediction. We introduce survivalFM, a machine learning extension to the widely used Cox proportional hazards model that incorporates estimation of all potential pairwise interaction effects on time-to-event outcomes. The method relies on learning a low-rank factorized approximation of the interaction effects, hence overcoming the computational and statistical limitations of fitting these terms in models involving\u0000many predictor variables. The resulting model is fully interpretable, providing access to the estimates of both individual effects and the approximated interactions. Comprehensive evaluation of survivalFM using the UK Biobank dataset across ten disease examples and a variety\u0000of clinical risk factors and omics data modalities shows improved discrimination and reclassification performance (65% and 97.5% of the scenarios tested, respectively). Considering a clinical scenario of cardiovascular risk prediction using predictors from the established\u0000QRISK3 model, we further show that the comprehensive interaction modelling adds predictive value beyond the individual and age interaction effects currently included. These results demonstrate that comprehensive modelling of interactions can facilitate advanced insights into disease development and improve risk predictions.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141931951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christopher James Duckworth, Dan K Burns, Carlos Lamas-Fernandez, Mark Wright, Rachael Leyland, Matthew Stammers, Michael George, Michael Boniface
{"title":"Predicting onward care needs at admission to reduce discharge delay using machine learning","authors":"Christopher James Duckworth, Dan K Burns, Carlos Lamas-Fernandez, Mark Wright, Rachael Leyland, Matthew Stammers, Michael George, Michael Boniface","doi":"10.1101/2024.08.07.24311596","DOIUrl":"https://doi.org/10.1101/2024.08.07.24311596","url":null,"abstract":"Early identification of patients who require onward referral for social care can prevent delays to discharge from hospital. We introduce a machine learning (ML) model to identify potential social care needs at the first point of admission. The model performance is comparable to clinician's predictions of discharge care needs, despite working with only a subset of the information available to the clinician. We find that ML and clinician perform better for identifying different types of care needs, highlighting the added value of a potential system supporting decision making. We also demonstrate the ability for ML to provide automated initial discharge need assessments, in the instance where initial clinical assessment is delayed. Finally, we demonstrate that combining clinician and machine predictions, in a hybrid model, provides even more accurate early predictions of onward social care requirements and demonstrates the potential for human-in-the-loop decision support systems in clinical practice.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141931950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Richard Williams, Thomas Bolton, David Jenkins, Mehrdad A Mizani, Matthew Sperrin, Cathie Sudlow, Angela Wood, Adrian Heald, Niels Peek, CVD-COVID-UK/COVID-IMPACT Consortium
{"title":"The challenges of replication: a worked example of methods reproducibility using electronic health record data","authors":"Richard Williams, Thomas Bolton, David Jenkins, Mehrdad A Mizani, Matthew Sperrin, Cathie Sudlow, Angela Wood, Adrian Heald, Niels Peek, CVD-COVID-UK/COVID-IMPACT Consortium","doi":"10.1101/2024.08.06.24311535","DOIUrl":"https://doi.org/10.1101/2024.08.06.24311535","url":null,"abstract":"The ability to reproduce the work of others is an essential part of the scientific disciplines. However, in practice it is hard, with several authors describing a \"replication crisis\" in research. For observational studies using electronic health record (EHR) data, replication is also important. However, replicating observational studies using EHR data can be challenging for many reasons, including complexities in data access, variations in EHR systems across institutions, and the potential for confounding variables that may not be fully accounted for. Observational research is typically given less weight in systematic reviews and clinical guidelines, in favour of more conclusive research such as randomised control trials. Observational research that is replicable has more impact.\u0000In this study we aimed to replicate a previous study that had examined the risk of hospitalisation following a positive COVID-19 test in individuals with diabetes. Using EHR data from the NHS England's Secure Data Environment covering the whole of England, UK (population 57m), we sought to replicate findings from the original study, which used data from Greater Manchester (a large urban region in the UK, population 2.9m). Both analyses were conducted in Trusted Research Environments (TREs) or Secure Data Environments (SDEs), containing linked primary and secondary\u0000care data. However, the small differences between the environments and the data sources led to several challenges in assessing reproducibility. In this paper we describe the differences between the environments, reflect on the challenges faced, and produce a list of recommendations for TREs and SDEs to assist future replication studies.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141931952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haoxin Chen, Will Simmons, Malak Hashish, Jiancheng Ye
{"title":"Telehealth Utilization and Patient Experiences: The Role of Social Determinants of Health Among Individuals with Hypertension and Diabetes","authors":"Haoxin Chen, Will Simmons, Malak Hashish, Jiancheng Ye","doi":"10.1101/2024.08.01.24311392","DOIUrl":"https://doi.org/10.1101/2024.08.01.24311392","url":null,"abstract":"Objective:\u0000To evaluate the utilization patterns, effectiveness, and patient satisfaction of telehealth services among individuals with hypertension and/or diabetes, and to investigate the influence of social determinants of health (SDOH) on telehealth access and utilization in this population. Methods: We conducted a cross-sectional analysis using data from the 2022 Health Information National Trends Survey (HINTS 6) by the National Cancer Institute. The study sample included 3,009 respondents with self-reported diabetes, hypertension, or both conditions. Telehealth usage was assessed through 14 survey questions, and participant characteristics were analyzed using sociodemographic, baseline health, and SDOH data. Results: Of the 6,252 HINTS 6 survey respondents, 3,009 met the inclusion criteria. Significant sociodemographic differences were observed across the diabetes and/or hypertension groups. No significant differences were found in telehealth usage among the groups, with 43.9% of respondents utilizing telehealth in the past year. Common reasons for telehealth use included provider recommendation, convenience, and infection avoidance. Social determinants of health, such as food insecurity and transportation issues, were more prevalent among individuals with both conditions, though no significant differences in telehealth experiences were noted across groups. Conclusion:\u0000Telehealth shows potential for managing chronic conditions like hypertension and diabetes, demonstrating substantial adoption and universal accessibility. However, disparities influenced by SDOH highlight the need for targeted interventions to ensure equitable access. Addressing privacy concerns, leveraging healthcare providers' recommendations, and tackling SDOH barriers are crucial for fostering wider telehealth adoption and improving outcomes. Future research should focus on the long-term impacts of telehealth and further investigate SDOH factors to develop tailored interventions that enhance engagement and equitable access across diverse patient populations.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141931953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ashley Lewis, Yash Samir Khandwala, Tina Hernandez-Boussard, James Brooks
{"title":"SDoH-Aware Approach to Prostate Cancer Screening: Addressing Overdiagnosis of Prostate Cancer using PSA","authors":"Ashley Lewis, Yash Samir Khandwala, Tina Hernandez-Boussard, James Brooks","doi":"10.1101/2024.07.31.24311297","DOIUrl":"https://doi.org/10.1101/2024.07.31.24311297","url":null,"abstract":"This study investigates the potential of multimodal data for prostate cancer (PCa) risk prediction using the All of Us (AoU) research program dataset. By integrating polygenic risk scores (PRSs) with diverse clinical, survey, and genomic data, we developed a model that identifies established PCa risk factors, such as age and family history, and a novel factor: recent healthcare visits are linked to reduced risk. The model's performance, notably the false positive rate, is improved compared to traditional methods, despite the lack of Prostate-Specific Antigen (PSA) data. The findings demonstrate that incorporating comprehensive multimodal data from AoU can enhance PCa risk prediction and provide a robust framework for future clinical applications.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amanda Momenzadeh, Caleb W Cranney, So Yung Choi, Catherine Bresee, Mourad Tighiouart, Roma Gianchandani, Joshua Pevnick, Jason Moore, Jesse Meyer
{"title":"Medications that Regulate Gastrointestinal Transit Influence Inpatient Blood Glucose","authors":"Amanda Momenzadeh, Caleb W Cranney, So Yung Choi, Catherine Bresee, Mourad Tighiouart, Roma Gianchandani, Joshua Pevnick, Jason Moore, Jesse Meyer","doi":"10.1101/2024.07.31.24311287","DOIUrl":"https://doi.org/10.1101/2024.07.31.24311287","url":null,"abstract":"Objective: A multitude of factors affect a hospitalized individual's blood glucose (BG), making BG difficult to predict and manage. Beyond medications well established to alter BG, such as beta-blockers, there are likely many medications with undiscovered effects on BG variability. Identification of these medications and the strength and timing of these relationships has potential to improve glycemic management and patient safety.\u0000Materials and Methods: EHR data from 103,871 inpatient encounters over 8 years within a large, urban health system was used to extract over 500 medications, laboratory measurements, and clinical predictors of BG. Feature selection was performed using an optimized Lasso model with repeated 5-fold cross-validation on the 80% training set, followed by a linear mixed regression model to evaluate statistical significance. Significant medication predictors were then evaluated for novelty against a comprehensive adverse drug event database. Results: We found 29 statistically significant features associated with BG; 24 were medications including 10 medications not previously documented to alter BG. The remaining five factors were Black/African American race, history of type 2 diabetes mellitus, prior BG (mean and last) and creatinine. Discussion: The unexpected medications, including several agents involved in gastrointestinal motility, found to affect BG were supported by available studies. This study may bring to light medications to use with caution in individuals with hyper- or hypoglycemia. Further investigation of these potential candidates is needed to enhance clinical utility of these findings. Conclusion: This study uniquely identifies medications involved in gastrointestinal transit to be predictors of BG that may not well established and recognized in clinical practice.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"54 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amir Bahmani, Kexin Cha, Arash Alavi, Amit Dixit, Antony Ross, Ryan Park, Francesca Goncalves, Shirley Ma, Paul Saxman, Ramesh Nair, Ramin Akhavan Sarraf, Xin Zhou, Meng Wang, Kevin Contrepois, Jennifer Li Pook Than, Emma Monte, David Jose Florez Rodriguez, Jaslene Lai, Mohan Babu, Abtin Tondar, Sophia Miryam Schussler-Fiorenza Rose, Ilya Akbari, Xinyue Zhang, Kritika Yegnashankaran, Joseph Yracheta, Kali Dale, Alison Derbenwick Miller, Scott Edmiston, Eva M McGhee, Camille Nebeker, Joseph C Wu, Anshul Kundaje, Michael Snyder
{"title":"Achieving Inclusive Healthcare through Integrating Education and Research with AI and Personalized Curricula","authors":"Amir Bahmani, Kexin Cha, Arash Alavi, Amit Dixit, Antony Ross, Ryan Park, Francesca Goncalves, Shirley Ma, Paul Saxman, Ramesh Nair, Ramin Akhavan Sarraf, Xin Zhou, Meng Wang, Kevin Contrepois, Jennifer Li Pook Than, Emma Monte, David Jose Florez Rodriguez, Jaslene Lai, Mohan Babu, Abtin Tondar, Sophia Miryam Schussler-Fiorenza Rose, Ilya Akbari, Xinyue Zhang, Kritika Yegnashankaran, Joseph Yracheta, Kali Dale, Alison Derbenwick Miller, Scott Edmiston, Eva M McGhee, Camille Nebeker, Joseph C Wu, Anshul Kundaje, Michael Snyder","doi":"10.1101/2024.07.31.24311182","DOIUrl":"https://doi.org/10.1101/2024.07.31.24311182","url":null,"abstract":"Precision medicine promises significant health benefits but faces challenges such as the need for complex data management and analytics, interdisciplinary collaboration, and education of researchers, healthcare professionals, and participants. Addressing these needs requires the integration of computational experts, engineers, designers, and healthcare professionals to develop user-friendly systems and shared terminologies. The widespread adoption of large language models (LLMs) like GPT-4 and Claude 3 highlights the importance of making complex data accessible to non-specialists. The Stanford Data Ocean (SDO) strives to mitigate these challenges through a scalable, cloud-based platform that supports data management for various data types, advanced research, and personalized learning in precision medicine. SDO provides AI tutors and AI-powered data visualization tools to enhance educational and research outcomes and make data analysis accessible for users from diverse educational backgrounds. By extending engagement and cutting-edge research capabilities globally, SDO particularly benefits economically disadvantaged and historically marginalized communities, fostering interdisciplinary biomedical research and bridging the gap between education and practical application in the biomedical field.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"86 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141867169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marirena Bafaloukou, Ann-Kathrin Schalkamp, Nan Victoria Fletcher-Lloyd, Alexander Capstick, Chloe Walsh, Cynthia Sandor, Samaneh Kouchaki, Ramin Nilforooshan, Payam Barnaghi
{"title":"An Interpretable Machine Learning Tool for In-Home Screening of Agitation Episodes in People Living with Dementia","authors":"Marirena Bafaloukou, Ann-Kathrin Schalkamp, Nan Victoria Fletcher-Lloyd, Alexander Capstick, Chloe Walsh, Cynthia Sandor, Samaneh Kouchaki, Ramin Nilforooshan, Payam Barnaghi","doi":"10.1101/2024.07.30.24311178","DOIUrl":"https://doi.org/10.1101/2024.07.30.24311178","url":null,"abstract":"Background\u0000Agitation affects around 30% of people living with dementia (PLwD), increasing carer burden and straining care services. Agitation screening typically relies on subjective clinical scales and direct patient observation, which are resource-intensive and challenging to incorporate into routine care. Clinical applicability of data-driven methods for agitation screening is limited by constraints such as short observational periods, data granularity, and lack of interpretability and generalisability. Current interventions for agitation are primarily medication-based, which may lead to severe side effects and lack personalisation. Understanding how real-world factors affect agitation within home settings offers a promising avenue towards identifying potential personalised non-pharmacological interventions. Methods\u0000We used longitudinal data (32,896 person-days from n=63 PLwD) collected using in-home monitoring devices. Employing machine learning techniques, we developed a screening tool to determine the weekly risk of agitation. We incorporated a traffic-light system for risk stratification to aid clinical decision-making and employed the SHapley Additive exPlanations (SHAP) framework to increase interpretability. We designed an interactive tool that enables the exploration of personalised non-pharmacological interventions, such as modifying ambient light and temperature. Results\u0000Light Gradient-boosting Machine (LightGBM) achieved the highest performance in identifying agitation with a sensitivity of 71.32±7.38% and specificity of 75.28±10.43%. Implementing the traffic-light system for risk stratification increased specificity by 15% and improved all metrics. Significant contributors to agitation included low nocturnal respiratory rate, heightened alertness during sleep, and increased indoor illuminance, as revealed by statistical and feature importance analysis. Using our interactive tool, we identified that adjusting indoor lighting levels and temperature were promising and feasible interventions within our cohort. Conclusions\u0000Our interpretable framework for agitation screening, developed using data from a dementia care study, showcases significant clinical value. The accompanying interactive interface allows for the in-silico simulation of non-pharmacological interventions, facilitating the design of personalised interventions that can improve in-home dementia care.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"208 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141867167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arash Alavi, Kexin Cha, Delara P Esfarjani, Bhavesh Patel, Jennifer Li Pook Than, Aaron Y Lee, Camille Nebeker, Michael Snyder, Amir Bahmani
{"title":"Perspective on Harnessing Large Language Models to Uncover Insights in Diabetes Wearable Data","authors":"Arash Alavi, Kexin Cha, Delara P Esfarjani, Bhavesh Patel, Jennifer Li Pook Than, Aaron Y Lee, Camille Nebeker, Michael Snyder, Amir Bahmani","doi":"10.1101/2024.07.29.24310315","DOIUrl":"https://doi.org/10.1101/2024.07.29.24310315","url":null,"abstract":"Large Language Models (LLMs) have gained significant attention and are increasingly used by researchers. Concurrently, publicly accessible datasets containing individual-level health information are becoming more available. Some of these datasets, such as the recently released Artificial Intelligence Ready and Equitable Atlas for Diabetes Insights (AI-READI) dataset, include individual-level data from digital wearable technologies. The application of LLMs to gain insights about health from wearable sensor data specific to diabetes is underexplored. This study presents a comprehensive evaluation of multiple LLMs, including GPT-3.5, GPT-4, GPT-4o, Gemini, Gemini 1.5 Pro, and Claude 3 Sonnet, on various diabetes research tasks using diverse prompting methods to evaluate their performance and gain new insights into diabetes and glucose dysregulation. Notably, GPT-4o showed promising performance across tasks with a chain-of-thought prompt design (aggregate performance score of 95.5%). Moreover, using this model, we identified new insights from the dataset, such as the heightened sensitivity to stress among diabetic participants during glucose level fluctuations, which underscores the complex interplay between metabolic and psychological factors. These results demonstrate that LLMs can enhance the pace of discovery and also enable automated interpretation of data for users of wearable devices, including both the research team and the individual wearing the device. Meanwhile, we also emphasize the critical limitations, such as privacy and ethical risks and dataset biases, that must be resolved for real-world application in diabetes health settings. This study highlights the potential and challenges of integrating LLMs into diabetes research and, more broadly, wearables, paving the way for future healthcare advancements, particularly in disadvantaged communities.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"149 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141867171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}