Chi Gao, P. Choudhury, P. Maas, R. Tamimi, H. Eliassen, N. Chatterjee, M. García-Closas, P. Kraft
{"title":"Abstract PR02: Validation of breast cancer risk prediction model using Nurses Health and Nurse Health II Studies","authors":"Chi Gao, P. Choudhury, P. Maas, R. Tamimi, H. Eliassen, N. Chatterjee, M. García-Closas, P. Kraft","doi":"10.1158/1538-7755.CARISK16-PR02","DOIUrl":null,"url":null,"abstract":"Background: Adding genetic and other biomarkers to breast cancer risk prediction models could markedly improve model discrimination; however, these expanded models have not been validated in a range of populations. In particular, the calibration of these new models how well the predicted absolute risks match observed risks has not been established. Good calibration is essential to confirm the utility of these risk models in precision prevention and treatment programs. Large cohort studies provide an ideal setting to validate risk models, as they can be used to validate both relative and absolute risks. However, in practice, genetic and biomarker data are often not available in the full cohort, but only on a sub sample of cases and controls. When the rules for sampling cases and controls into the sub sample are known, inverse-probability-of-sampling (IPW) weights can be used to estimate empirical absolute risks. When the sampling rules are unknown or complicated, the IPW weights can be estimated by regressing selection into the sub sample on matching and other inclusion criteria. Methods: We evaluated the performance of recently published breast cancer risk prediction models [Maas et al. JAMA Oncol 2016] in the Nurses Health Study (NHS) and Nurses Health Study II (NHSII). We first assess a prediction model that only includes questionnaire data (BMI, hormone replacement therapy (HRT), alcohol consumption, smoking status, height, parity, age at menarche and menopause, age at first birth, and family history of breast cancer). These data are available on all subjects in the NHS and NHSII blood subcohorts: 32,826 women in NHS (with disease follow-up from 1990-2012) and 29,611 women in NHS II (1999-2013). We will then validate a model that includes both questionnaire data and a polygenic risk score based on 92 established risk SNPs. Genetic data are available on case-control samples nested within the blood subcohorts: 2308 breast cancer cases and 3344 controls from NHS and 612 breast cancer cases and 933 controls from NHSII. We estimated IPW weights among controls using logistic regression in the blood subcohorts, with sampling as control being the outcome and the following predictors: age at baseline, menopausal status, HRT, length of HRT use for premenopausal women at baseline, and length of follow up time. We used the iCARE software package (Maas P, Chatterjee N, Wheeler W et al. 2015) to calculate predicted 5 and 10-year absolute risks of breast cancer based on the published models, empirical 5 and 10-year incidence across deciles of predicted risk, and Hosmer-Lemeshow goodness of fit and AUC statistics. Results: For the risk model without genetic information, predicted risks in the blood subcohorts ranged from 6.5/1,000 (1st decile) to 20.1/1,000 (10th decile) for NHS. Although empirical risks increased across deciles at approximately the same rate as predicted rates, empirical risks were higher than predicted (Hosmer-Lemeshow p Due to matching and selection on control status, the baseline distribution of questionnaire risk factors differed between the blood subcohorts and the controls from the nested case-control samples. The IPW-weighted distribution in controls closely matched the distribution in the full subcohorts, suggesting a well-calculated weight. We will present IPW-based validation of the risk model in the nested case-control samples (work in progress). Conclusions: These results confirm that breast cancer risk prediction models can discriminate between high-risk and low-risk women, but they also highlight that the accuracy of absolute risk estimates can vary across populations. Findings from this study can add insights into model improvement and model application. Moreover, the method of using IPW weights to approximate a full cohort analysis provides a potential solution for utilizing nested case-control studies in future validation analyses. This abstract is also being presented as Poster A05. Citation Format: Chi Gao, Parichoy Pal Choudhury, Paige Maas, Rulla Tamimi, Heather Eliassen, Nilanjan Chatterjee, Montserrat Garcia-Closas, Peter Kraft. Validation of breast cancer risk prediction model using Nurses Health and Nurse Health II Studies. [abstract]. In: Proceedings of the AACR Special Conference: Improving Cancer Risk Prediction for Prevention and Early Detection; Nov 16-19, 2016; Orlando, FL. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2017;26(5 Suppl):Abstract nr PR02.","PeriodicalId":9487,"journal":{"name":"Cancer Epidemiology and Prevention Biomarkers","volume":"68 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer Epidemiology and Prevention Biomarkers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1158/1538-7755.CARISK16-PR02","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Background: Adding genetic and other biomarkers to breast cancer risk prediction models could markedly improve model discrimination; however, these expanded models have not been validated in a range of populations. In particular, the calibration of these new models how well the predicted absolute risks match observed risks has not been established. Good calibration is essential to confirm the utility of these risk models in precision prevention and treatment programs. Large cohort studies provide an ideal setting to validate risk models, as they can be used to validate both relative and absolute risks. However, in practice, genetic and biomarker data are often not available in the full cohort, but only on a sub sample of cases and controls. When the rules for sampling cases and controls into the sub sample are known, inverse-probability-of-sampling (IPW) weights can be used to estimate empirical absolute risks. When the sampling rules are unknown or complicated, the IPW weights can be estimated by regressing selection into the sub sample on matching and other inclusion criteria. Methods: We evaluated the performance of recently published breast cancer risk prediction models [Maas et al. JAMA Oncol 2016] in the Nurses Health Study (NHS) and Nurses Health Study II (NHSII). We first assess a prediction model that only includes questionnaire data (BMI, hormone replacement therapy (HRT), alcohol consumption, smoking status, height, parity, age at menarche and menopause, age at first birth, and family history of breast cancer). These data are available on all subjects in the NHS and NHSII blood subcohorts: 32,826 women in NHS (with disease follow-up from 1990-2012) and 29,611 women in NHS II (1999-2013). We will then validate a model that includes both questionnaire data and a polygenic risk score based on 92 established risk SNPs. Genetic data are available on case-control samples nested within the blood subcohorts: 2308 breast cancer cases and 3344 controls from NHS and 612 breast cancer cases and 933 controls from NHSII. We estimated IPW weights among controls using logistic regression in the blood subcohorts, with sampling as control being the outcome and the following predictors: age at baseline, menopausal status, HRT, length of HRT use for premenopausal women at baseline, and length of follow up time. We used the iCARE software package (Maas P, Chatterjee N, Wheeler W et al. 2015) to calculate predicted 5 and 10-year absolute risks of breast cancer based on the published models, empirical 5 and 10-year incidence across deciles of predicted risk, and Hosmer-Lemeshow goodness of fit and AUC statistics. Results: For the risk model without genetic information, predicted risks in the blood subcohorts ranged from 6.5/1,000 (1st decile) to 20.1/1,000 (10th decile) for NHS. Although empirical risks increased across deciles at approximately the same rate as predicted rates, empirical risks were higher than predicted (Hosmer-Lemeshow p Due to matching and selection on control status, the baseline distribution of questionnaire risk factors differed between the blood subcohorts and the controls from the nested case-control samples. The IPW-weighted distribution in controls closely matched the distribution in the full subcohorts, suggesting a well-calculated weight. We will present IPW-based validation of the risk model in the nested case-control samples (work in progress). Conclusions: These results confirm that breast cancer risk prediction models can discriminate between high-risk and low-risk women, but they also highlight that the accuracy of absolute risk estimates can vary across populations. Findings from this study can add insights into model improvement and model application. Moreover, the method of using IPW weights to approximate a full cohort analysis provides a potential solution for utilizing nested case-control studies in future validation analyses. This abstract is also being presented as Poster A05. Citation Format: Chi Gao, Parichoy Pal Choudhury, Paige Maas, Rulla Tamimi, Heather Eliassen, Nilanjan Chatterjee, Montserrat Garcia-Closas, Peter Kraft. Validation of breast cancer risk prediction model using Nurses Health and Nurse Health II Studies. [abstract]. In: Proceedings of the AACR Special Conference: Improving Cancer Risk Prediction for Prevention and Early Detection; Nov 16-19, 2016; Orlando, FL. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2017;26(5 Suppl):Abstract nr PR02.