对Fisher信息进行分解，以确定样本量，从而开发或更新公平、精确的个体风险临床预测模型——第一部分：二元结果。

IF 2.6

Diagnostic and prognostic research Pub Date : 2025-07-08 DOI:10.1186/s41512-025-00193-9

Richard D Riley, Gary S Collins, Rebecca Whittle, Lucinda Archer, Kym I E Snell, Paula Dhiman, Laura Kirton, Amardeep Legha, Xiaoxuan Liu, Alastair K Denniston, Frank E Harrell, Laure Wynants, Glen P Martin, Joie Ensor

{"title":"对Fisher信息进行分解，以确定样本量，从而开发或更新公平、精确的个体风险临床预测模型——第一部分：二元结果。","authors":"Richard D Riley, Gary S Collins, Rebecca Whittle, Lucinda Archer, Kym I E Snell, Paula Dhiman, Laura Kirton, Amardeep Legha, Xiaoxuan Liu, Alastair K Denniston, Frank E Harrell, Laure Wynants, Glen P Martin, Joie Ensor","doi":"10.1186/s41512-025-00193-9","DOIUrl":null,"url":null,"abstract":"Background: When using a dataset to develop or update a clinical prediction model, small sample sizes increase concerns of overfitting, instability, poor predictive performance and a lack of fairness. For models estimating the risk of a binary outcome, previous research has outlined sample size calculations that target low overfitting and a precise overall risk estimate. However, more guidance is needed for targeting precise and fair individual-level risk estimates.Methods: We propose a decomposition of Fisher's information matrix to help examine sample sizes required for developing or updating a model, aiming for precise and fair individual-level risk estimates. We outline a five-step process for use before data collection or when an existing dataset or pilot study is available. It requires researchers to specify the overall risk in the target population, the (anticipated) distribution of key predictors in the model and an assumed 'core model' either specified directly (i.e. a logistic regression equation is provided) or based on a specified C-statistic and relative effects of (standardised) predictors.Results: We produce closed-form solutions that decompose the variance of an individual's risk estimate into the Fisher's unit information matrix, predictor values and the total sample size. This allows researchers to quickly calculate and examine the anticipated precision of individual-level predictions and classifications for specified sample sizes. The information can be presented to key stakeholders (e.g. health professionals, patients, grant funders) to inform target sample sizes for prospective data collection or whether an existing dataset is sufficient. Our proposal is implemented in our new software module pmstabilityss. We provide two real examples and emphasise the importance of clinical context, including any risk thresholds for decision making and fairness checks.Conclusions: Our approach helps researchers examine potential sample sizes required to target precise and fair individual-level predictions when developing or updating prediction models for binary outcomes.","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"14"},"PeriodicalIF":2.6000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12235806/pdf/","citationCount":"0","resultStr":"{\"title\":\"A decomposition of Fisher's information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk-part 1: binary outcomes.\",\"authors\":\"Richard D Riley, Gary S Collins, Rebecca Whittle, Lucinda Archer, Kym I E Snell, Paula Dhiman, Laura Kirton, Amardeep Legha, Xiaoxuan Liu, Alastair K Denniston, Frank E Harrell, Laure Wynants, Glen P Martin, Joie Ensor\",\"doi\":\"10.1186/s41512-025-00193-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: When using a dataset to develop or update a clinical prediction model, small sample sizes increase concerns of overfitting, instability, poor predictive performance and a lack of fairness. For models estimating the risk of a binary outcome, previous research has outlined sample size calculations that target low overfitting and a precise overall risk estimate. However, more guidance is needed for targeting precise and fair individual-level risk estimates.Methods: We propose a decomposition of Fisher's information matrix to help examine sample sizes required for developing or updating a model, aiming for precise and fair individual-level risk estimates. We outline a five-step process for use before data collection or when an existing dataset or pilot study is available. It requires researchers to specify the overall risk in the target population, the (anticipated) distribution of key predictors in the model and an assumed 'core model' either specified directly (i.e. a logistic regression equation is provided) or based on a specified C-statistic and relative effects of (standardised) predictors.Results: We produce closed-form solutions that decompose the variance of an individual's risk estimate into the Fisher's unit information matrix, predictor values and the total sample size. This allows researchers to quickly calculate and examine the anticipated precision of individual-level predictions and classifications for specified sample sizes. The information can be presented to key stakeholders (e.g. health professionals, patients, grant funders) to inform target sample sizes for prospective data collection or whether an existing dataset is sufficient. Our proposal is implemented in our new software module pmstabilityss. We provide two real examples and emphasise the importance of clinical context, including any risk thresholds for decision making and fairness checks.Conclusions: Our approach helps researchers examine potential sample sizes required to target precise and fair individual-level predictions when developing or updating prediction models for binary outcomes.\",\"PeriodicalId\":72800,\"journal\":{\"name\":\"Diagnostic and prognostic research\",\"volume\":\"9 1\",\"pages\":\"14\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12235806/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Diagnostic and prognostic research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s41512-025-00193-9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostic and prognostic research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s41512-025-00193-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

背景：当使用数据集开发或更新临床预测模型时，小样本量会增加对过拟合、不稳定、预测性能差和缺乏公平性的担忧。对于估计二元结果风险的模型，先前的研究概述了以低过拟合和精确的总体风险估计为目标的样本量计算。然而，需要更多的指导来精确和公平地评估个人层面的风险。方法：我们提出Fisher信息矩阵的分解，以帮助检查开发或更新模型所需的样本量，旨在精确和公平地估计个人层面的风险。我们概述了在数据收集之前或当现有数据集或试点研究可用时使用的五步过程。它要求研究人员指定目标人群的总体风险，模型中关键预测因子的（预期）分布和假设的“核心模型”，要么直接指定（即提供逻辑回归方程），要么基于指定的c统计量和（标准化）预测因子的相对效应。结果：我们产生了封闭形式的解决方案，将个体风险估计的方差分解为Fisher单位信息矩阵、预测值和总样本量。这使研究人员能够快速计算和检查个人水平预测的预期精度，并对指定的样本量进行分类。这些信息可提交给关键利益攸关方（如卫生专业人员、患者、赠款资助者），以告知未来数据收集的目标样本量或现有数据集是否足够。我们的建议在我们的新软件模块“稳定性”中实现。我们提供了两个真实的例子，并强调临床环境的重要性，包括决策和公平性检查的任何风险阈值。结论：我们的方法有助于研究人员在开发或更新二元结果的预测模型时，检查精确和公平的个人水平预测所需的潜在样本量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A decomposition of Fisher's information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk-part 1: binary outcomes.

Background: When using a dataset to develop or update a clinical prediction model, small sample sizes increase concerns of overfitting, instability, poor predictive performance and a lack of fairness. For models estimating the risk of a binary outcome, previous research has outlined sample size calculations that target low overfitting and a precise overall risk estimate. However, more guidance is needed for targeting precise and fair individual-level risk estimates.

Methods: We propose a decomposition of Fisher's information matrix to help examine sample sizes required for developing or updating a model, aiming for precise and fair individual-level risk estimates. We outline a five-step process for use before data collection or when an existing dataset or pilot study is available. It requires researchers to specify the overall risk in the target population, the (anticipated) distribution of key predictors in the model and an assumed 'core model' either specified directly (i.e. a logistic regression equation is provided) or based on a specified C-statistic and relative effects of (standardised) predictors.

Results: We produce closed-form solutions that decompose the variance of an individual's risk estimate into the Fisher's unit information matrix, predictor values and the total sample size. This allows researchers to quickly calculate and examine the anticipated precision of individual-level predictions and classifications for specified sample sizes. The information can be presented to key stakeholders (e.g. health professionals, patients, grant funders) to inform target sample sizes for prospective data collection or whether an existing dataset is sufficient. Our proposal is implemented in our new software module pmstabilityss. We provide two real examples and emphasise the importance of clinical context, including any risk thresholds for decision making and fairness checks.

Conclusions: Our approach helps researchers examine potential sample sizes required to target precise and fair individual-level predictions when developing or updating prediction models for binary outcomes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Diagnostic and prognostic research

自引率

0.00%

发文量

审稿时长

18 weeks