{"title":"利用临床和遗传风险因素对英国生物库中的八种癌症进行风险预测。","authors":"Jiaqi Hu, Yixuan Ye, Geyu Zhou, Hongyu Zhao","doi":"10.1093/jncics/pkae008","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Models with polygenic risk scores and clinical factors to predict risk of different cancers have been developed, but these models have been limited by the polygenic risk score-derivation methods and the incomplete selection of clinical variables.</p><p><strong>Methods: </strong>We used UK Biobank to train the best polygenic risk scores for 8 cancers (bladder, breast, colorectal, kidney, lung, ovarian, pancreatic, and prostate cancers) and select relevant clinical variables from 733 baseline traits through extreme gradient boosting (XGBoost). Combining polygenic risk scores and clinical variables, we developed Cox proportional hazards models for risk prediction in these cancers.</p><p><strong>Results: </strong>Our models achieved high prediction accuracy for 8 cancers, with areas under the curve ranging from 0.618 (95% confidence interval = 0.581 to 0.655) for ovarian cancer to 0.831 (95% confidence interval = 0.817 to 0.845) for lung cancer. Additionally, our models could identify individuals at a high risk for developing cancer. For example, the risk of breast cancer for individuals in the top 5% score quantile was nearly 13 times greater than for individuals in the lowest 10%. Furthermore, we observed a higher proportion of individuals with high polygenic risk scores in the early-onset group but a higher proportion of individuals at high clinical risk in the late-onset group.</p><p><strong>Conclusion: </strong>Our models demonstrated the potential to predict cancer risk and identify high-risk individuals with great generalizability to different cancers. Our findings suggested that the polygenic risk score model is more predictive for the cancer risk of early-onset patients than for late-onset patients, while the clinical risk model is more predictive for late-onset patients. Meanwhile, combining polygenic risk scores and clinical risk factors has overall better predictive performance than using polygenic risk scores or clinical risk factors alone.</p>","PeriodicalId":14681,"journal":{"name":"JNCI Cancer Spectrum","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10919929/pdf/","citationCount":"0","resultStr":"{\"title\":\"Using clinical and genetic risk factors for risk prediction of 8 cancers in the UK Biobank.\",\"authors\":\"Jiaqi Hu, Yixuan Ye, Geyu Zhou, Hongyu Zhao\",\"doi\":\"10.1093/jncics/pkae008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Models with polygenic risk scores and clinical factors to predict risk of different cancers have been developed, but these models have been limited by the polygenic risk score-derivation methods and the incomplete selection of clinical variables.</p><p><strong>Methods: </strong>We used UK Biobank to train the best polygenic risk scores for 8 cancers (bladder, breast, colorectal, kidney, lung, ovarian, pancreatic, and prostate cancers) and select relevant clinical variables from 733 baseline traits through extreme gradient boosting (XGBoost). Combining polygenic risk scores and clinical variables, we developed Cox proportional hazards models for risk prediction in these cancers.</p><p><strong>Results: </strong>Our models achieved high prediction accuracy for 8 cancers, with areas under the curve ranging from 0.618 (95% confidence interval = 0.581 to 0.655) for ovarian cancer to 0.831 (95% confidence interval = 0.817 to 0.845) for lung cancer. Additionally, our models could identify individuals at a high risk for developing cancer. For example, the risk of breast cancer for individuals in the top 5% score quantile was nearly 13 times greater than for individuals in the lowest 10%. Furthermore, we observed a higher proportion of individuals with high polygenic risk scores in the early-onset group but a higher proportion of individuals at high clinical risk in the late-onset group.</p><p><strong>Conclusion: </strong>Our models demonstrated the potential to predict cancer risk and identify high-risk individuals with great generalizability to different cancers. Our findings suggested that the polygenic risk score model is more predictive for the cancer risk of early-onset patients than for late-onset patients, while the clinical risk model is more predictive for late-onset patients. Meanwhile, combining polygenic risk scores and clinical risk factors has overall better predictive performance than using polygenic risk scores or clinical risk factors alone.</p>\",\"PeriodicalId\":14681,\"journal\":{\"name\":\"JNCI Cancer Spectrum\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-02-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10919929/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JNCI Cancer Spectrum\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/jncics/pkae008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JNCI Cancer Spectrum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jncics/pkae008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
背景:目前已开发出利用多基因风险评分(PRS)和临床因素预测不同癌症风险的模型。然而,这些模型受到了PRS衍生方法和临床变量选择不全面的限制:我们利用英国生物库(UKBB)训练了八种癌症(膀胱癌、乳腺癌、结直肠癌、肾癌、肺癌、卵巢癌、胰腺癌和前列腺癌)的最佳PRS,并通过极端梯度提升(XGBoost)从733个基线特征中选择了相关临床变量。结合PRS和临床变量,我们建立了用于这些癌症风险预测的Cox比例危险模型:我们的模型对八种癌症的预测准确率很高,卵巢癌的 AUC 为 0.618(95% CI 0.581-0.655),肺癌的 AUC 为 0.831(95% CI 0.817-0.845)。此外,我们的模型还可以识别出癌症高风险人群。例如,与得分最低的 10%的受试者相比,得分最高的 5%的受试者罹患乳腺癌的风险高出近 13 倍。此外,我们还观察到早发组的高PRS人群比例较高,但晚发组的高临床风险人群比例较高:我们的模型证明了预测癌症风险和识别高危人群的潜力,并对不同癌症具有很强的普适性。我们的研究结果表明,PRS 模型对早发患者癌症风险的预测能力强于晚发患者,而临床风险模型对晚发患者的预测能力更强。同时,结合 PRS 和临床风险因素比单独使用 PRS 或临床风险因素具有更好的预测效果。
Using clinical and genetic risk factors for risk prediction of 8 cancers in the UK Biobank.
Background: Models with polygenic risk scores and clinical factors to predict risk of different cancers have been developed, but these models have been limited by the polygenic risk score-derivation methods and the incomplete selection of clinical variables.
Methods: We used UK Biobank to train the best polygenic risk scores for 8 cancers (bladder, breast, colorectal, kidney, lung, ovarian, pancreatic, and prostate cancers) and select relevant clinical variables from 733 baseline traits through extreme gradient boosting (XGBoost). Combining polygenic risk scores and clinical variables, we developed Cox proportional hazards models for risk prediction in these cancers.
Results: Our models achieved high prediction accuracy for 8 cancers, with areas under the curve ranging from 0.618 (95% confidence interval = 0.581 to 0.655) for ovarian cancer to 0.831 (95% confidence interval = 0.817 to 0.845) for lung cancer. Additionally, our models could identify individuals at a high risk for developing cancer. For example, the risk of breast cancer for individuals in the top 5% score quantile was nearly 13 times greater than for individuals in the lowest 10%. Furthermore, we observed a higher proportion of individuals with high polygenic risk scores in the early-onset group but a higher proportion of individuals at high clinical risk in the late-onset group.
Conclusion: Our models demonstrated the potential to predict cancer risk and identify high-risk individuals with great generalizability to different cancers. Our findings suggested that the polygenic risk score model is more predictive for the cancer risk of early-onset patients than for late-onset patients, while the clinical risk model is more predictive for late-onset patients. Meanwhile, combining polygenic risk scores and clinical risk factors has overall better predictive performance than using polygenic risk scores or clinical risk factors alone.