子宫内膜癌复发的机器学习预测模型的训练、验证和测试。

IF 5.3 2区医学 Q1 ONCOLOGY

JCO precision oncology Pub Date : 2025-05-01 Epub Date: 2025-05-05 DOI:10.1200/PO-24-00859

Jesus Gonzalez Bosquet, Andrew Polio, Erin George, Ahmad A Tarhini, Casey M Cosgrove, Marilyn S Huang, Bradley Corr, Aliza L Leiser, Bodour Salhia, Kathleen Darcy, Christopher M Tarney, Rob L Dood, Lauren E Dockery, Stephen B Edge, Michael J Cavnar, Lisa Landrum, Rob J Rounbehler, Michelle Churchman, Vincent M Wagner

{"title":"子宫内膜癌复发的机器学习预测模型的训练、验证和测试。","authors":"Jesus Gonzalez Bosquet, Andrew Polio, Erin George, Ahmad A Tarhini, Casey M Cosgrove, Marilyn S Huang, Bradley Corr, Aliza L Leiser, Bodour Salhia, Kathleen Darcy, Christopher M Tarney, Rob L Dood, Lauren E Dockery, Stephen B Edge, Michael J Cavnar, Lisa Landrum, Rob J Rounbehler, Michelle Churchman, Vincent M Wagner","doi":"10.1200/PO-24-00859","DOIUrl":null,"url":null,"abstract":"Purpose: Endometrial cancer (EC) is the most common gynecologic cancer in the United States with rising incidence and mortality. Despite optimal treatment, 15%-20% of all patients will recur. To better select patients for adjuvant therapy, it is important to accurately predict patients at risk for recurrence. Our objective was to train, validate, and test models of EC recurrence using lasso regression and other machine learning (ML) and deep learning (DL) analytics in a large, comprehensive data set.Methods: Data from patients with EC were downloaded from the Oncology Research Information Exchange Network database and stratified into low risk, The International Federation of Gynecology and Obstetrics (FIGO) grade 1 and 2, stage I (N = 329); high risk, or FIGO grade 3 or stages II, III, IV (N = 324); and nonendometrioid histology (N = 239) groups. Clinical, pathologic, genomic, and genetic data were used for the analysis. Genomic data included microRNA, long noncoding RNA, isoforms, and pseudogene expressions. Genetic variation included single-nucleotide variation (SNV) and copy-number variation (CNV). In the discovery phase, we selected variables informative for recurrence (P < .05), using univariate analyses of variance. Then, we trained, validated, and tested multivariate models using selected variables and lasso regression, MATLAB (ML), and TensorFlow (DL).Results: Recurrence clinic models for low-risk, high-risk, and high-risk nonendometrioid histology had AUCs of 56%, 70%, and 65%, respectively. For training, we selected models with AUC >80%: five for the low-risk group, 20 models for the high-risk group, and 20 for the nonendometrioid group. The two best low-risk models included clinical data and CNVs. For the high-risk group, three of the five best-performing models included pseudogene expression. For the nonendometrioid group, pseudogene expression and SNV were overrepresented in the best models.Conclusion: Prediction models of EC recurrence built with ML and DL analytics had better performance than models with clinical and pathologic data alone. Prospective validation is required to determine clinical utility.","PeriodicalId":14797,"journal":{"name":"JCO precision oncology","volume":"9 ","pages":"e2400859"},"PeriodicalIF":5.3000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12054588/pdf/","citationCount":"0","resultStr":"{\"title\":\"Training, Validating, and Testing Machine Learning Prediction Models for Endometrial Cancer Recurrence.\",\"authors\":\"Jesus Gonzalez Bosquet, Andrew Polio, Erin George, Ahmad A Tarhini, Casey M Cosgrove, Marilyn S Huang, Bradley Corr, Aliza L Leiser, Bodour Salhia, Kathleen Darcy, Christopher M Tarney, Rob L Dood, Lauren E Dockery, Stephen B Edge, Michael J Cavnar, Lisa Landrum, Rob J Rounbehler, Michelle Churchman, Vincent M Wagner\",\"doi\":\"10.1200/PO-24-00859\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose: Endometrial cancer (EC) is the most common gynecologic cancer in the United States with rising incidence and mortality. Despite optimal treatment, 15%-20% of all patients will recur. To better select patients for adjuvant therapy, it is important to accurately predict patients at risk for recurrence. Our objective was to train, validate, and test models of EC recurrence using lasso regression and other machine learning (ML) and deep learning (DL) analytics in a large, comprehensive data set.Methods: Data from patients with EC were downloaded from the Oncology Research Information Exchange Network database and stratified into low risk, The International Federation of Gynecology and Obstetrics (FIGO) grade 1 and 2, stage I (N = 329); high risk, or FIGO grade 3 or stages II, III, IV (N = 324); and nonendometrioid histology (N = 239) groups. Clinical, pathologic, genomic, and genetic data were used for the analysis. Genomic data included microRNA, long noncoding RNA, isoforms, and pseudogene expressions. Genetic variation included single-nucleotide variation (SNV) and copy-number variation (CNV). In the discovery phase, we selected variables informative for recurrence (P < .05), using univariate analyses of variance. Then, we trained, validated, and tested multivariate models using selected variables and lasso regression, MATLAB (ML), and TensorFlow (DL).Results: Recurrence clinic models for low-risk, high-risk, and high-risk nonendometrioid histology had AUCs of 56%, 70%, and 65%, respectively. For training, we selected models with AUC >80%: five for the low-risk group, 20 models for the high-risk group, and 20 for the nonendometrioid group. The two best low-risk models included clinical data and CNVs. For the high-risk group, three of the five best-performing models included pseudogene expression. For the nonendometrioid group, pseudogene expression and SNV were overrepresented in the best models.Conclusion: Prediction models of EC recurrence built with ML and DL analytics had better performance than models with clinical and pathologic data alone. Prospective validation is required to determine clinical utility.\",\"PeriodicalId\":14797,\"journal\":{\"name\":\"JCO precision oncology\",\"volume\":\"9 \",\"pages\":\"e2400859\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12054588/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JCO precision oncology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1200/PO-24-00859\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/5/5 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JCO precision oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1200/PO-24-00859","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/5 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ONCOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

目的：子宫内膜癌（EC）是美国最常见的妇科癌症，发病率和死亡率都在上升。尽管接受了最佳治疗，仍有15%-20%的患者会复发。为了更好地选择患者进行辅助治疗，准确预测患者的复发风险是很重要的。我们的目标是使用lasso回归和其他机器学习（ML）和深度学习（DL）分析在一个大型、全面的数据集中训练、验证和测试EC递归模型。方法：从肿瘤学研究信息交换网络数据库下载EC患者的数据，并将其分为低风险，国际妇产科学联合会（FIGO） 1级和2级，I期（N = 329）；高风险，或FIGO 3级或II、III、IV期（N = 324）；和非子宫内膜样组织组（N = 239）。临床、病理、基因组和遗传数据用于分析。基因组数据包括microRNA、长链非编码RNA、同种异构体和假基因表达。遗传变异包括单核苷酸变异（SNV）和拷贝数变异（CNV）。在发现阶段，我们使用单变量方差分析选择了具有复发信息的变量（P < 0.05）。然后，我们使用选定的变量和lasso回归、MATLAB （ML）和TensorFlow （DL）训练、验证和测试多变量模型。结果：低风险、高风险和高风险非子宫内膜样组织的复发临床模型auc分别为56%、70%和65%。对于训练，我们选择AUC bbb80 %的模型：低危组5个，高危组20个，非子宫内膜组20个。两种最佳的低风险模型包括临床数据和CNVs。对于高危组，五个表现最好的模型中有三个包含假基因表达。对于非子宫内膜样细胞组，假基因表达和SNV在最佳模型中被过度代表。结论：用ML和DL分析建立的EC复发预测模型比单独用临床和病理数据建立的模型具有更好的预测效果。需要前瞻性验证来确定临床效用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Training, Validating, and Testing Machine Learning Prediction Models for Endometrial Cancer Recurrence.

Purpose: Endometrial cancer (EC) is the most common gynecologic cancer in the United States with rising incidence and mortality. Despite optimal treatment, 15%-20% of all patients will recur. To better select patients for adjuvant therapy, it is important to accurately predict patients at risk for recurrence. Our objective was to train, validate, and test models of EC recurrence using lasso regression and other machine learning (ML) and deep learning (DL) analytics in a large, comprehensive data set.

Methods: Data from patients with EC were downloaded from the Oncology Research Information Exchange Network database and stratified into low risk, The International Federation of Gynecology and Obstetrics (FIGO) grade 1 and 2, stage I (N = 329); high risk, or FIGO grade 3 or stages II, III, IV (N = 324); and nonendometrioid histology (N = 239) groups. Clinical, pathologic, genomic, and genetic data were used for the analysis. Genomic data included microRNA, long noncoding RNA, isoforms, and pseudogene expressions. Genetic variation included single-nucleotide variation (SNV) and copy-number variation (CNV). In the discovery phase, we selected variables informative for recurrence (P < .05), using univariate analyses of variance. Then, we trained, validated, and tested multivariate models using selected variables and lasso regression, MATLAB (ML), and TensorFlow (DL).

Results: Recurrence clinic models for low-risk, high-risk, and high-risk nonendometrioid histology had AUCs of 56%, 70%, and 65%, respectively. For training, we selected models with AUC >80%: five for the low-risk group, 20 models for the high-risk group, and 20 for the nonendometrioid group. The two best low-risk models included clinical data and CNVs. For the high-risk group, three of the five best-performing models included pseudogene expression. For the nonendometrioid group, pseudogene expression and SNV were overrepresented in the best models.

Conclusion: Prediction models of EC recurrence built with ML and DL analytics had better performance than models with clinical and pathologic data alone. Prospective validation is required to determine clinical utility.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

JCO precision oncology ONCOLOGY-

CiteScore

9.10

自引率

4.30%

发文量

363