Prof Richard D Riley PhD , Joie Ensor PhD , Kym I E Snell PhD , Lucinda Archer PhD , Rebecca Whittle PhD , Paula Dhiman PhD , Joseph Alderman MBChB , Xiaoxuan Liu PhD , Laura Kirton MSc , Jay Manson-Whitton , Maarten van Smeden PhD , Prof Karel G Moons PhD , Prof Krishnarajah Nirantharakumar MD , Prof Jean-Baptiste Cazier PhD , Prof Alastair K Denniston PhD , Prof Ben Van Calster PhD , Prof Gary S Collins PhD
{"title":"Importance of sample size on the quality and utility of AI-based prediction models for healthcare","authors":"Prof Richard D Riley PhD , Joie Ensor PhD , Kym I E Snell PhD , Lucinda Archer PhD , Rebecca Whittle PhD , Paula Dhiman PhD , Joseph Alderman MBChB , Xiaoxuan Liu PhD , Laura Kirton MSc , Jay Manson-Whitton , Maarten van Smeden PhD , Prof Karel G Moons PhD , Prof Krishnarajah Nirantharakumar MD , Prof Jean-Baptiste Cazier PhD , Prof Alastair K Denniston PhD , Prof Ben Van Calster PhD , Prof Gary S Collins PhD","doi":"10.1016/j.landig.2025.01.013","DOIUrl":null,"url":null,"abstract":"<div><div>Rigorous study design and analytical standards are required to generate reliable findings in healthcare from artificial intelligence (AI) research. One crucial but often overlooked aspect is the determination of appropriate sample sizes for studies developing AI-based prediction models for individual diagnosis or prognosis. Specifically, the number of participants and outcome events required in datasets for model training and evaluation remains inadequately addressed. Most AI studies do not provide a rationale for their chosen sample sizes and frequently rely on datasets that are inadequate for training or evaluating a clinical prediction model. Among the ten principles of Good Machine Learning Practice established by the US Food and Drug Administration, the UK Medicines and Healthcare products Regulatory Agency, and Health Canada, guidance on sample size is directly relevant to at least three principles. To reinforce this recommendation, we outline seven reasons why inadequate sample size negatively affects model training, evaluation, and performance. Using a range of examples, we illustrate these issues and discuss the potentially harmful consequences for patient care and clinical adoption. Additionally, we address challenges associated with increasing sample sizes in AI research and highlight existing approaches and software for calculating the minimum sample sizes required for model training and evaluation.</div></div>","PeriodicalId":48534,"journal":{"name":"Lancet Digital Health","volume":"7 6","pages":"Article 100857"},"PeriodicalIF":23.8000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lancet Digital Health","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589750025000214","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
Rigorous study design and analytical standards are required to generate reliable findings in healthcare from artificial intelligence (AI) research. One crucial but often overlooked aspect is the determination of appropriate sample sizes for studies developing AI-based prediction models for individual diagnosis or prognosis. Specifically, the number of participants and outcome events required in datasets for model training and evaluation remains inadequately addressed. Most AI studies do not provide a rationale for their chosen sample sizes and frequently rely on datasets that are inadequate for training or evaluating a clinical prediction model. Among the ten principles of Good Machine Learning Practice established by the US Food and Drug Administration, the UK Medicines and Healthcare products Regulatory Agency, and Health Canada, guidance on sample size is directly relevant to at least three principles. To reinforce this recommendation, we outline seven reasons why inadequate sample size negatively affects model training, evaluation, and performance. Using a range of examples, we illustrate these issues and discuss the potentially harmful consequences for patient care and clinical adoption. Additionally, we address challenges associated with increasing sample sizes in AI research and highlight existing approaches and software for calculating the minimum sample sizes required for model training and evaluation.
期刊介绍:
The Lancet Digital Health publishes important, innovative, and practice-changing research on any topic connected with digital technology in clinical medicine, public health, and global health.
The journal’s open access content crosses subject boundaries, building bridges between health professionals and researchers.By bringing together the most important advances in this multidisciplinary field,The Lancet Digital Health is the most prominent publishing venue in digital health.
We publish a range of content types including Articles,Review, Comment, and Correspondence, contributing to promoting digital technologies in health practice worldwide.