Importance of sample size on the quality and utility of AI-based prediction models for healthcare

IF 23.8 1区医学 Q1 MEDICAL INFORMATICS

Lancet Digital Health Pub Date : 2025-06-01 DOI:10.1016/j.landig.2025.01.013

Prof Richard D Riley PhD , Joie Ensor PhD , Kym I E Snell PhD , Lucinda Archer PhD , Rebecca Whittle PhD , Paula Dhiman PhD , Joseph Alderman MBChB , Xiaoxuan Liu PhD , Laura Kirton MSc , Jay Manson-Whitton , Maarten van Smeden PhD , Prof Karel G Moons PhD , Prof Krishnarajah Nirantharakumar MD , Prof Jean-Baptiste Cazier PhD , Prof Alastair K Denniston PhD , Prof Ben Van Calster PhD , Prof Gary S Collins PhD

{"title":"Importance of sample size on the quality and utility of AI-based prediction models for healthcare","authors":"Prof Richard D Riley PhD , Joie Ensor PhD , Kym I E Snell PhD , Lucinda Archer PhD , Rebecca Whittle PhD , Paula Dhiman PhD , Joseph Alderman MBChB , Xiaoxuan Liu PhD , Laura Kirton MSc , Jay Manson-Whitton , Maarten van Smeden PhD , Prof Karel G Moons PhD , Prof Krishnarajah Nirantharakumar MD , Prof Jean-Baptiste Cazier PhD , Prof Alastair K Denniston PhD , Prof Ben Van Calster PhD , Prof Gary S Collins PhD","doi":"10.1016/j.landig.2025.01.013","DOIUrl":null,"url":null,"abstract":"<div><div>Rigorous study design and analytical standards are required to generate reliable findings in healthcare from artificial intelligence (AI) research. One crucial but often overlooked aspect is the determination of appropriate sample sizes for studies developing AI-based prediction models for individual diagnosis or prognosis. Specifically, the number of participants and outcome events required in datasets for model training and evaluation remains inadequately addressed. Most AI studies do not provide a rationale for their chosen sample sizes and frequently rely on datasets that are inadequate for training or evaluating a clinical prediction model. Among the ten principles of Good Machine Learning Practice established by the US Food and Drug Administration, the UK Medicines and Healthcare products Regulatory Agency, and Health Canada, guidance on sample size is directly relevant to at least three principles. To reinforce this recommendation, we outline seven reasons why inadequate sample size negatively affects model training, evaluation, and performance. Using a range of examples, we illustrate these issues and discuss the potentially harmful consequences for patient care and clinical adoption. Additionally, we address challenges associated with increasing sample sizes in AI research and highlight existing approaches and software for calculating the minimum sample sizes required for model training and evaluation.</div></div>","PeriodicalId":48534,"journal":{"name":"Lancet Digital Health","volume":"7 6","pages":"Article 100857"},"PeriodicalIF":23.8000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lancet Digital Health","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589750025000214","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Rigorous study design and analytical standards are required to generate reliable findings in healthcare from artificial intelligence (AI) research. One crucial but often overlooked aspect is the determination of appropriate sample sizes for studies developing AI-based prediction models for individual diagnosis or prognosis. Specifically, the number of participants and outcome events required in datasets for model training and evaluation remains inadequately addressed. Most AI studies do not provide a rationale for their chosen sample sizes and frequently rely on datasets that are inadequate for training or evaluating a clinical prediction model. Among the ten principles of Good Machine Learning Practice established by the US Food and Drug Administration, the UK Medicines and Healthcare products Regulatory Agency, and Health Canada, guidance on sample size is directly relevant to at least three principles. To reinforce this recommendation, we outline seven reasons why inadequate sample size negatively affects model training, evaluation, and performance. Using a range of examples, we illustrate these issues and discuss the potentially harmful consequences for patient care and clinical adoption. Additionally, we address challenges associated with increasing sample sizes in AI research and highlight existing approaches and software for calculating the minimum sample sizes required for model training and evaluation.

查看原文本刊更多论文

样本大小对基于人工智能的医疗保健预测模型的质量和效用的重要性。

为了从人工智能（AI）研究中获得可靠的医疗保健结果，需要严格的研究设计和分析标准。一个关键但经常被忽视的方面是确定适当的样本量，用于开发基于人工智能的个体诊断或预后预测模型的研究。具体来说，模型训练和评估所需的数据集中的参与者和结果事件的数量仍然没有得到充分的解决。大多数人工智能研究没有为其选择的样本量提供基本原理，并且经常依赖于不足以训练或评估临床预测模型的数据集。在由美国食品和药物管理局、英国药品和保健产品监管局和加拿大卫生部建立的良好机器学习实践的十大原则中，关于样本量的指导至少与三项原则直接相关。为了加强这一建议，我们列出了七个原因，为什么样本量不足会对模型训练、评估和性能产生负面影响。通过一系列的例子，我们说明了这些问题，并讨论了对患者护理和临床采用的潜在有害后果。此外，我们还解决了与人工智能研究中样本量增加相关的挑战，并强调了用于计算模型训练和评估所需的最小样本量的现有方法和软件。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Lancet Digital Health Multiple-

CiteScore

41.20

自引率

1.60%

发文量

232

审稿时长

13 weeks

期刊介绍： The Lancet Digital Health publishes important, innovative, and practice-changing research on any topic connected with digital technology in clinical medicine, public health, and global health. The journal’s open access content crosses subject boundaries, building bridges between health professionals and researchers.By bringing together the most important advances in this multidisciplinary field,The Lancet Digital Health is the most prominent publishing venue in digital health. We publish a range of content types including Articles,Review, Comment, and Correspondence, contributing to promoting digital technologies in health practice worldwide.