Kirsten Zantvoort, Barbara Nacke, Dennis Görlich, Silvan Hornstein, Corinna Jacobi, Burkhardt Funk
{"title":"Estimation of minimal data sets sizes for machine learning predictions in digital mental health interventions","authors":"Kirsten Zantvoort, Barbara Nacke, Dennis Görlich, Silvan Hornstein, Corinna Jacobi, Burkhardt Funk","doi":"10.1038/s41746-024-01360-w","DOIUrl":"https://doi.org/10.1038/s41746-024-01360-w","url":null,"abstract":"<p>Artificial intelligence promises to revolutionize mental health care, but small dataset sizes and lack of robust methods raise concerns about result generalizability. To provide insights on minimal necessary data set sizes, we explore domain-specific learning curves for digital intervention dropout predictions based on 3654 users from a single study (ISRCTN13716228, 26/02/2016). Prediction performance is analyzed based on dataset size (<i>N</i> = 100–3654), feature groups (F = 2–129), and algorithm choice (from Naive Bayes to Neural Networks). The results substantiate the concern that small datasets (<i>N</i> ≤ 300) overestimate predictive power. For uninformative feature groups, in-sample prediction performance was negatively correlated with dataset size. Sophisticated models overfitted in small datasets but maximized holdout test results in larger datasets. While <i>N</i> = 500 mitigated overfitting, performance did not converge until <i>N</i> = 750–1500. Consequently, we propose minimum dataset sizes of <i>N</i> = 500–1000. As such, this study offers an empirical reference for researchers designing or interpreting AI studies on Digital Mental Health Intervention data.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"91 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142841352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elizabeth J. Enichen, Kimia Heydari, Joseph C. Kvedar
{"title":"Assessing alternative strategies for measuring metabolic risk","authors":"Elizabeth J. Enichen, Kimia Heydari, Joseph C. Kvedar","doi":"10.1038/s41746-024-01376-2","DOIUrl":"https://doi.org/10.1038/s41746-024-01376-2","url":null,"abstract":"Qiao et al. recently investigated the ability of dual-energy X-ray absorptiometry (DXA) scans and a smartphone app to provide detailed body composition and shape data. In a healthcare system that continues to rely on crude and stigmatizing measurements like body-mass index (BMI), their findings point to the potential of newer technologies to capture markers (i.e., visceral adiposity and fat distribution patterns) that provide clearer insights into metabolic health.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"23 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142841026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chin Siang Ong, Nicholas T. Obey, Yanan Zheng, Arman Cohan, Eric B. Schneider
{"title":"SurgeryLLM: a retrieval-augmented generation large language model framework for surgical decision support and workflow enhancement","authors":"Chin Siang Ong, Nicholas T. Obey, Yanan Zheng, Arman Cohan, Eric B. Schneider","doi":"10.1038/s41746-024-01391-3","DOIUrl":"https://doi.org/10.1038/s41746-024-01391-3","url":null,"abstract":"<p>SurgeryLLM, a large language model framework using Retrieval Augmented Generation demonstrably incorporated domain-specific knowledge from current evidence-based surgical guidelines when presented with patient-specific data. The successful incorporation of guideline-based information represents a substantial step toward enabling greater surgeon efficiency, improving patient safety, and optimizing surgical outcomes.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"83 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142849163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel S. Herman, Jenna T. Reece, Gary E. Weissman
{"title":"Lessons for local oversight of AI in medicine from the regulation of clinical laboratory testing","authors":"Daniel S. Herman, Jenna T. Reece, Gary E. Weissman","doi":"10.1038/s41746-024-01369-1","DOIUrl":"10.1038/s41746-024-01369-1","url":null,"abstract":"Current regulatory frameworks for artificial intelligence-based clinical decision support (AICDS) are insufficient to ensure safety, effectiveness, and equity at the bedside. The oversight of clinical laboratory testing, which requires federal- and hospital-level involvement, offers many instructive lessons for how to balance safety and innovation and warnings regarding the fragility of this balance. We propose an AICDS oversight framework, modeled after clinical laboratory regulation, that is deliberative, inclusive, and collaborative.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":"1-6"},"PeriodicalIF":12.4,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01369-1.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142811387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dennis Fast, Lisa C. Adams, Felix Busch, Conor Fallon, Marc Huppertz, Robert Siepmann, Philipp Prucker, Nadine Bayerl, Daniel Truhn, Marcus Makowski, Alexander Löser, Keno K. Bressem
{"title":"Autonomous medical evaluation for guideline adherence of large language models","authors":"Dennis Fast, Lisa C. Adams, Felix Busch, Conor Fallon, Marc Huppertz, Robert Siepmann, Philipp Prucker, Nadine Bayerl, Daniel Truhn, Marcus Makowski, Alexander Löser, Keno K. Bressem","doi":"10.1038/s41746-024-01356-6","DOIUrl":"10.1038/s41746-024-01356-6","url":null,"abstract":"Autonomous Medical Evaluation for Guideline Adherence (AMEGA) is a comprehensive benchmark designed to evaluate large language models’ adherence to medical guidelines across 20 diagnostic scenarios spanning 13 specialties. It includes an evaluation framework and methodology to assess models’ capabilities in medical reasoning, differential diagnosis, treatment planning, and guideline adherence, using open-ended questions that mirror real-world clinical interactions. It includes 135 questions and 1337 weighted scoring elements designed to assess comprehensive medical knowledge. In tests of 17 LLMs, GPT-4 scored highest with 41.9/50, followed closely by Llama-3 70B and WizardLM-2-8x22B. For comparison, a recent medical graduate scored 25.8/50. The benchmark introduces novel content to avoid the issue of LLMs memorizing existing medical data. AMEGA’s publicly available code supports further research in AI-assisted clinical decision-making, aiming to enhance patient care by aiding clinicians in diagnosis and treatment under time constraints.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":"1-14"},"PeriodicalIF":12.4,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01356-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142809725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anna Reuter, Mohammed K. Ali, Viswanathan Mohan, Lydia Chwastiak, Kavita Singh, K. M. Venkat Narayan, Dorairaj Prabhakaran, Nikhil Tandon, Nikkil Sudharsanan
{"title":"Predicting control of cardiovascular disease risk factors in South Asia using machine learning","authors":"Anna Reuter, Mohammed K. Ali, Viswanathan Mohan, Lydia Chwastiak, Kavita Singh, K. M. Venkat Narayan, Dorairaj Prabhakaran, Nikhil Tandon, Nikkil Sudharsanan","doi":"10.1038/s41746-024-01353-9","DOIUrl":"10.1038/s41746-024-01353-9","url":null,"abstract":"A substantial share of patients at risk of developing cardiovascular disease (CVD) fail to achieve control of CVD risk factors, but clinicians lack a structured approach to identify these patients. We applied machine learning to longitudinal data from two completed randomized controlled trials among 1502 individuals with diabetes in urban India and Pakistan. Using commonly available clinical data, we predict each individual’s risk of failing to achieve CVD risk factor control goals or meaningful improvements in risk factors at one year after baseline. When classifying those in the top quartile of predicted risk scores as at risk of failing to achieve goals or meaningful improvements, the precision for not achieving goals was 73% for HbA1c, 30% for SBP, and 24% for LDL, and for not achieving meaningful improvements 88% for HbA1c, 87% for SBP, and 85% for LDL. Such models could be integrated into routine care and enable efficient and targeted delivery of health resources in resource-constrained settings.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":"1-10"},"PeriodicalIF":12.4,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01353-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142797101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adam Marcus, Grant Mair, Liang Chen, Charles Hallett, Claudia Ghezzou Cuervas-Mons, Dylan Roi, Daniel Rueckert, Paul Bentley
{"title":"Deep learning biomarker of chronometric and biological ischemic stroke lesion age from unenhanced CT","authors":"Adam Marcus, Grant Mair, Liang Chen, Charles Hallett, Claudia Ghezzou Cuervas-Mons, Dylan Roi, Daniel Rueckert, Paul Bentley","doi":"10.1038/s41746-024-01325-z","DOIUrl":"10.1038/s41746-024-01325-z","url":null,"abstract":"Estimating progression of acute ischemic brain lesions – or biological lesion age - holds huge practical importance for hyperacute stroke management. The current best method for determining lesion age from non-contrast computerised tomography (NCCT), measures Relative Intensity (RI), termed Net Water Uptake (NWU). We optimised lesion age estimation from NCCT using a convolutional neural network – radiomics (CNN-R) model trained upon chronometric lesion age (Onset Time to Scan: OTS), while validating against chronometric and biological lesion age in external datasets (N = 1945). Coefficients of determination (R2) for OTS prediction, using CNN-R, and RI models were 0.58 and 0.32 respectively; while CNN-R estimated OTS showed stronger associations with ischemic core:penumbra ratio, than RI and chronometric, OTS (ρ2 = 0.37, 0.19, 0.11); and with early lesion expansion (regression coefficients >2x for CNN-R versus others) (all comparisons: p < 0.05). Concluding, deep-learning analytics of NCCT lesions is approximately twice as accurate as NWU for estimating chronometric and biological lesion ages.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":"1-10"},"PeriodicalIF":12.4,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01325-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142778658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frank P.-W. Lo, Jianing Qiu, Modou L. Jobarteh, Yingnan Sun, Zeyu Wang, Shuo Jiang, Tom Baranowski, Alex K. Anderson, Megan A. McCrory, Edward Sazonov, Wenyan Jia, Mingui Sun, Matilda Steiner-Asiedu, Gary Frost, Benny Lo
{"title":"AI-enabled wearable cameras for assisting dietary assessment in African populations","authors":"Frank P.-W. Lo, Jianing Qiu, Modou L. Jobarteh, Yingnan Sun, Zeyu Wang, Shuo Jiang, Tom Baranowski, Alex K. Anderson, Megan A. McCrory, Edward Sazonov, Wenyan Jia, Mingui Sun, Matilda Steiner-Asiedu, Gary Frost, Benny Lo","doi":"10.1038/s41746-024-01346-8","DOIUrl":"10.1038/s41746-024-01346-8","url":null,"abstract":"We have developed a population-level method for dietary assessment using low-cost wearable cameras. Our approach, EgoDiet, employs an egocentric vision-based pipeline to learn portion sizes, addressing the shortcomings of traditional self-reported dietary methods. To evaluate the functionality of this method, field studies were conducted in London (Study A) and Ghana (Study B) among populations of Ghanaian and Kenyan origin. In Study A, EgoDiet’s estimations were contrasted with dietitians’ assessments, revealing a performance with a Mean Absolute Percentage Error (MAPE) of 31.9% for portion size estimation, compared to 40.1% for estimates made by dietitians. We further evaluated our approach in Study B, comparing its performance to the traditional 24-Hour Dietary Recall (24HR). Our approach demonstrated a MAPE of 28.0%, showing a reduction in error when contrasted with the 24HR, which exhibited a MAPE of 32.5%. This improvement highlights the potential of using passive camera technology to serve as an alternative to the traditional dietary assessment methods.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":"1-16"},"PeriodicalIF":12.4,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01346-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142778648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Karim Kadry, Shreya Gupta, Farhad R. Nezami, Elazer R. Edelman
{"title":"Probing the limits and capabilities of diffusion models for the anatomic editing of digital twins","authors":"Karim Kadry, Shreya Gupta, Farhad R. Nezami, Elazer R. Edelman","doi":"10.1038/s41746-024-01332-0","DOIUrl":"10.1038/s41746-024-01332-0","url":null,"abstract":"Numerical simulations of cardiovascular device deployment within digital twins of patient-specific anatomy can expedite and de-risk the device design process. Nonetheless, the exclusive use of patient-specific data constrains the anatomic variability that can be explored. We study how Latent Diffusion Models (LDMs) can edit digital twins to create digital siblings. Siblings can serve as the basis for comparative simulations, which can reveal how subtle anatomic variations impact device deployment, and augment virtual cohorts for improved device assessment. Using a case example centered on cardiac anatomy, we study various methods to generate digital siblings. We specifically introduce anatomic variation at different spatial scales or within localized regions, demonstrating the existence of bias toward common anatomic features. We furthermore leverage this bias for virtual cohort augmentation through selective editing, addressing issues related to dataset imbalance and diversity. Our framework delineates the capabilities of diffusion models in synthesizing anatomic variation for numerical simulation studies.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":"1-12"},"PeriodicalIF":12.4,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01332-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142776955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Charalampos Sotirakis, Maksymilian A. Brzezicki, Salil Patel, Niall Conway, James J. FitzGerald, Chrystalina A. Antoniades
{"title":"Predicting future fallers in Parkinson’s disease using kinematic data over a period of 5 years","authors":"Charalampos Sotirakis, Maksymilian A. Brzezicki, Salil Patel, Niall Conway, James J. FitzGerald, Chrystalina A. Antoniades","doi":"10.1038/s41746-024-01311-5","DOIUrl":"10.1038/s41746-024-01311-5","url":null,"abstract":"Parkinson’s disease (PD) increases fall risk, leading to injuries and reduced quality of life. Accurate fall risk assessment is crucial for effective care planning. Traditional assessments are subjective and time-consuming, while recent assessment methods based on wearable sensors have been limited to 1-year follow-ups. This study investigated whether a short sensor-based assessment could predict falls over up to 5 years. Data from 104 people with PD without prior falls were collected using six wearable sensors during a 2-min walk and a 30-s postural sway task. Five machine learning classifiers analysed the data. The Random Forest classifier performed best, achieving 78% accuracy (AUC = 0.85) at 60 months. Most models showed excellent performance at 24 months (AUC > 0.90, accuracy 84–92%). Walking and postural variability measures were key predictors. Adding clinicodemographic data, particularly age, improved model performance. Wearable sensors combined with machine learning can effectively predict fall risk, enhancing PD management and prevention strategies.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":"1-9"},"PeriodicalIF":12.4,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01311-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142776977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}