Yaqi Cao, Weidong Ma, Ge Zhao, Anne Marie McCarthy, Jinbo Chen
{"title":"A constrained maximum likelihood approach to developing well-calibrated models for predicting binary outcomes.","authors":"Yaqi Cao, Weidong Ma, Ge Zhao, Anne Marie McCarthy, Jinbo Chen","doi":"10.1007/s10985-024-09628-9","DOIUrl":null,"url":null,"abstract":"<p><p>The added value of candidate predictors for risk modeling is routinely evaluated by comparing the performance of models with or without including candidate predictors. Such comparison is most meaningful when the estimated risk by the two models are both unbiased in the target population. Very often data for candidate predictors are sourced from nonrepresentative convenience samples. Updating the base model using the study data without acknowledging the discrepancy between the underlying distribution of the study data and that in the target population can lead to biased risk estimates and therefore an unfair evaluation of candidate predictors. To address this issue assuming access to a well-calibrated base model, we propose a semiparametric method for model fitting that enforces good calibration. The central idea is to calibrate the fitted model against the base model by enforcing suitable constraints in maximizing the likelihood function. This approach enables unbiased assessment of model improvement offered by candidate predictors without requiring a representative sample from the target population, thus overcoming a significant practical challenge. We study theoretical properties for model parameter estimates, and demonstrate improvement in model calibration via extensive simulation studies. Finally, we apply the proposed method to data extracted from Penn Medicine Biobank to inform the added value of breast density for breast cancer risk assessment in the Caucasian woman population.</p>","PeriodicalId":49908,"journal":{"name":"Lifetime Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.2000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lifetime Data Analysis","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s10985-024-09628-9","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/5/8 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
The added value of candidate predictors for risk modeling is routinely evaluated by comparing the performance of models with or without including candidate predictors. Such comparison is most meaningful when the estimated risk by the two models are both unbiased in the target population. Very often data for candidate predictors are sourced from nonrepresentative convenience samples. Updating the base model using the study data without acknowledging the discrepancy between the underlying distribution of the study data and that in the target population can lead to biased risk estimates and therefore an unfair evaluation of candidate predictors. To address this issue assuming access to a well-calibrated base model, we propose a semiparametric method for model fitting that enforces good calibration. The central idea is to calibrate the fitted model against the base model by enforcing suitable constraints in maximizing the likelihood function. This approach enables unbiased assessment of model improvement offered by candidate predictors without requiring a representative sample from the target population, thus overcoming a significant practical challenge. We study theoretical properties for model parameter estimates, and demonstrate improvement in model calibration via extensive simulation studies. Finally, we apply the proposed method to data extracted from Penn Medicine Biobank to inform the added value of breast density for breast cancer risk assessment in the Caucasian woman population.
期刊介绍:
The objective of Lifetime Data Analysis is to advance and promote statistical science in the various applied fields that deal with lifetime data, including: Actuarial Science – Economics – Engineering Sciences – Environmental Sciences – Management Science – Medicine – Operations Research – Public Health – Social and Behavioral Sciences.