Y Yang, H Zhou, Y K Wang, Y Dai, R J Pi, H Zhang, Z Y Huang, T Wu, J H Yang, W Chen
{"title":"[Construction and preliminary validation of machine learning predictive models for cervical cancer screening based on human DNA methylation].","authors":"Y Yang, H Zhou, Y K Wang, Y Dai, R J Pi, H Zhang, Z Y Huang, T Wu, J H Yang, W Chen","doi":"10.3760/cma.j.cn112152-20230925-00156","DOIUrl":null,"url":null,"abstract":"<p><p><b>Objective:</b> Using methylation characteristics of human genes to construct machine learning predictive models for screening cervical cancer and precancerous lesions. <b>Methods:</b> Human DNA methylation detection was performed on 224 cervical exfoliated cell specimens from the Cancer Hospital of the Chinese Academy of Medical Sciences, Tianjin Central Hospital of Gynecology Obstetrics, Xinmi Maternal and Child Health Hospital of Henan Province, West China Second Affiliated Hospital of Sichuan University, and Heping Hospital Affiliated to Changzhi Medical College collected during April 2014 and March 2015. The hypermethylated gene fragments related to cervical cancer were selected by high-density, high-association, and hypermethylated gene fragment screening and the LASSO regression algorithm. Taking cervical intraepithelial neoplasia grade 2 (CIN2) or more severe lesions as the research outcome, machine learning predictive models based on the random forest (RF), naive Bayes (NB), and support vector machine (SVM) algorithm, respectively, were constructed. A total of 144 outpatient specimens were used as the training set and 80 cervical exfoliated cell specimens from women participating in the cervical cancer screening program were used as the test set to verify the predictive models. Using histological diagnosis results as the gold standard, the detection efficacy for CIN2 or more severe lesions of the three machine learning predictive models were compared with that of the human papilloma virus (HPV) detection and cytological diagnosis. <b>Results:</b> In the training set of 144 cases, there were 34 cases of HPV positivity, with a positive rate of 23.61%. Cytologically, there were 37 cases diagnosed as no intraepithelial lesion or malignancy (NILM), and 107 cases diagnosed as atypical squamous cells of undetermined significance (ASC-US) or above. Histologically, there were 28 cases without cervical intraepithelial neoplasia or benign cervical lesions, 31 cases of CIN1, 18 cases of CIN2, 31 cases of CIN3, and 36 cases of squamous cell carcinoma. Seven hypermethylated gene fragments were selected from 45 genes, and three machine learning prediction models based on the RF, NB, and SVM algorithm, respectively, were constructed. In the validation set of 80 cases, there were 28 cases of HPV positivity, with a positive rate of 35.00%. Cytologically, there were 65 cases diagnosed as NILM and 15 cases as ASC-US or above. Histologically, there were 39 cases without cervical intraepithelial neoplasia or benign cervical lesions, 10 cases of CIN1, 10 cases of CIN2, 11 cases of CIN3, and 10 cases of squamous cell carcinoma. In the validation set, the area under the curve (AUC) values of the RF model, NB model, SVM model, HPV detection, and cytological diagnosis of CIN2 or above were 0.90, 0.88, 0.82, 0.68, and 0.45, respectively. The DeLong test showed that there was no statistically significant difference in the AUC values between the RF, NB, and SVM models (all <i>P</i>>0.05), and the AUC values of the RF and NB models were higher than that of HPV detection (both <i>P</i><0.01), and the AUC values of the RF, NB, and SVM models were higher than that of cytological diagnosis (all <i>P</i><0.01). Compared with the NB model, the sensitivity of the RF model was similar (80.65% vs. 77.42%), but the specificity of the NB model was much higher than that of the RF model (93.88% vs. 73.47%). <b>Conclusion:</b> Among the machine learning prediction models for cervical cancer and precancerous lesions constructed based on human DNA methylation, the NB model has good predictive performance for CIN2 and above lesions, and may be used for screening of cervical cancer and precancerous lesions.</p>","PeriodicalId":39868,"journal":{"name":"中华肿瘤杂志","volume":"47 2","pages":"193-200"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"中华肿瘤杂志","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3760/cma.j.cn112152-20230925-00156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: Using methylation characteristics of human genes to construct machine learning predictive models for screening cervical cancer and precancerous lesions. Methods: Human DNA methylation detection was performed on 224 cervical exfoliated cell specimens from the Cancer Hospital of the Chinese Academy of Medical Sciences, Tianjin Central Hospital of Gynecology Obstetrics, Xinmi Maternal and Child Health Hospital of Henan Province, West China Second Affiliated Hospital of Sichuan University, and Heping Hospital Affiliated to Changzhi Medical College collected during April 2014 and March 2015. The hypermethylated gene fragments related to cervical cancer were selected by high-density, high-association, and hypermethylated gene fragment screening and the LASSO regression algorithm. Taking cervical intraepithelial neoplasia grade 2 (CIN2) or more severe lesions as the research outcome, machine learning predictive models based on the random forest (RF), naive Bayes (NB), and support vector machine (SVM) algorithm, respectively, were constructed. A total of 144 outpatient specimens were used as the training set and 80 cervical exfoliated cell specimens from women participating in the cervical cancer screening program were used as the test set to verify the predictive models. Using histological diagnosis results as the gold standard, the detection efficacy for CIN2 or more severe lesions of the three machine learning predictive models were compared with that of the human papilloma virus (HPV) detection and cytological diagnosis. Results: In the training set of 144 cases, there were 34 cases of HPV positivity, with a positive rate of 23.61%. Cytologically, there were 37 cases diagnosed as no intraepithelial lesion or malignancy (NILM), and 107 cases diagnosed as atypical squamous cells of undetermined significance (ASC-US) or above. Histologically, there were 28 cases without cervical intraepithelial neoplasia or benign cervical lesions, 31 cases of CIN1, 18 cases of CIN2, 31 cases of CIN3, and 36 cases of squamous cell carcinoma. Seven hypermethylated gene fragments were selected from 45 genes, and three machine learning prediction models based on the RF, NB, and SVM algorithm, respectively, were constructed. In the validation set of 80 cases, there were 28 cases of HPV positivity, with a positive rate of 35.00%. Cytologically, there were 65 cases diagnosed as NILM and 15 cases as ASC-US or above. Histologically, there were 39 cases without cervical intraepithelial neoplasia or benign cervical lesions, 10 cases of CIN1, 10 cases of CIN2, 11 cases of CIN3, and 10 cases of squamous cell carcinoma. In the validation set, the area under the curve (AUC) values of the RF model, NB model, SVM model, HPV detection, and cytological diagnosis of CIN2 or above were 0.90, 0.88, 0.82, 0.68, and 0.45, respectively. The DeLong test showed that there was no statistically significant difference in the AUC values between the RF, NB, and SVM models (all P>0.05), and the AUC values of the RF and NB models were higher than that of HPV detection (both P<0.01), and the AUC values of the RF, NB, and SVM models were higher than that of cytological diagnosis (all P<0.01). Compared with the NB model, the sensitivity of the RF model was similar (80.65% vs. 77.42%), but the specificity of the NB model was much higher than that of the RF model (93.88% vs. 73.47%). Conclusion: Among the machine learning prediction models for cervical cancer and precancerous lesions constructed based on human DNA methylation, the NB model has good predictive performance for CIN2 and above lesions, and may be used for screening of cervical cancer and precancerous lesions.