[Construction and preliminary validation of machine learning predictive models for cervical cancer screening based on human DNA methylation].

Q3 Medicine
Y Yang, H Zhou, Y K Wang, Y Dai, R J Pi, H Zhang, Z Y Huang, T Wu, J H Yang, W Chen
{"title":"[Construction and preliminary validation of machine learning predictive models for cervical cancer screening based on human DNA methylation].","authors":"Y Yang, H Zhou, Y K Wang, Y Dai, R J Pi, H Zhang, Z Y Huang, T Wu, J H Yang, W Chen","doi":"10.3760/cma.j.cn112152-20230925-00156","DOIUrl":null,"url":null,"abstract":"<p><p><b>Objective:</b> Using methylation characteristics of human genes to construct machine learning predictive models for screening cervical cancer and precancerous lesions. <b>Methods:</b> Human DNA methylation detection was performed on 224 cervical exfoliated cell specimens from the Cancer Hospital of the Chinese Academy of Medical Sciences, Tianjin Central Hospital of Gynecology Obstetrics, Xinmi Maternal and Child Health Hospital of Henan Province, West China Second Affiliated Hospital of Sichuan University, and Heping Hospital Affiliated to Changzhi Medical College collected during April 2014 and March 2015. The hypermethylated gene fragments related to cervical cancer were selected by high-density, high-association, and hypermethylated gene fragment screening and the LASSO regression algorithm. Taking cervical intraepithelial neoplasia grade 2 (CIN2) or more severe lesions as the research outcome, machine learning predictive models based on the random forest (RF), naive Bayes (NB), and support vector machine (SVM) algorithm, respectively, were constructed. A total of 144 outpatient specimens were used as the training set and 80 cervical exfoliated cell specimens from women participating in the cervical cancer screening program were used as the test set to verify the predictive models. Using histological diagnosis results as the gold standard, the detection efficacy for CIN2 or more severe lesions of the three machine learning predictive models were compared with that of the human papilloma virus (HPV) detection and cytological diagnosis. <b>Results:</b> In the training set of 144 cases, there were 34 cases of HPV positivity, with a positive rate of 23.61%. Cytologically, there were 37 cases diagnosed as no intraepithelial lesion or malignancy (NILM), and 107 cases diagnosed as atypical squamous cells of undetermined significance (ASC-US) or above. Histologically, there were 28 cases without cervical intraepithelial neoplasia or benign cervical lesions, 31 cases of CIN1, 18 cases of CIN2, 31 cases of CIN3, and 36 cases of squamous cell carcinoma. Seven hypermethylated gene fragments were selected from 45 genes, and three machine learning prediction models based on the RF, NB, and SVM algorithm, respectively, were constructed. In the validation set of 80 cases, there were 28 cases of HPV positivity, with a positive rate of 35.00%. Cytologically, there were 65 cases diagnosed as NILM and 15 cases as ASC-US or above. Histologically, there were 39 cases without cervical intraepithelial neoplasia or benign cervical lesions, 10 cases of CIN1, 10 cases of CIN2, 11 cases of CIN3, and 10 cases of squamous cell carcinoma. In the validation set, the area under the curve (AUC) values of the RF model, NB model, SVM model, HPV detection, and cytological diagnosis of CIN2 or above were 0.90, 0.88, 0.82, 0.68, and 0.45, respectively. The DeLong test showed that there was no statistically significant difference in the AUC values between the RF, NB, and SVM models (all <i>P</i>>0.05), and the AUC values of the RF and NB models were higher than that of HPV detection (both <i>P</i><0.01), and the AUC values of the RF, NB, and SVM models were higher than that of cytological diagnosis (all <i>P</i><0.01). Compared with the NB model, the sensitivity of the RF model was similar (80.65% vs. 77.42%), but the specificity of the NB model was much higher than that of the RF model (93.88% vs. 73.47%). <b>Conclusion:</b> Among the machine learning prediction models for cervical cancer and precancerous lesions constructed based on human DNA methylation, the NB model has good predictive performance for CIN2 and above lesions, and may be used for screening of cervical cancer and precancerous lesions.</p>","PeriodicalId":39868,"journal":{"name":"中华肿瘤杂志","volume":"47 2","pages":"193-200"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"中华肿瘤杂志","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3760/cma.j.cn112152-20230925-00156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: Using methylation characteristics of human genes to construct machine learning predictive models for screening cervical cancer and precancerous lesions. Methods: Human DNA methylation detection was performed on 224 cervical exfoliated cell specimens from the Cancer Hospital of the Chinese Academy of Medical Sciences, Tianjin Central Hospital of Gynecology Obstetrics, Xinmi Maternal and Child Health Hospital of Henan Province, West China Second Affiliated Hospital of Sichuan University, and Heping Hospital Affiliated to Changzhi Medical College collected during April 2014 and March 2015. The hypermethylated gene fragments related to cervical cancer were selected by high-density, high-association, and hypermethylated gene fragment screening and the LASSO regression algorithm. Taking cervical intraepithelial neoplasia grade 2 (CIN2) or more severe lesions as the research outcome, machine learning predictive models based on the random forest (RF), naive Bayes (NB), and support vector machine (SVM) algorithm, respectively, were constructed. A total of 144 outpatient specimens were used as the training set and 80 cervical exfoliated cell specimens from women participating in the cervical cancer screening program were used as the test set to verify the predictive models. Using histological diagnosis results as the gold standard, the detection efficacy for CIN2 or more severe lesions of the three machine learning predictive models were compared with that of the human papilloma virus (HPV) detection and cytological diagnosis. Results: In the training set of 144 cases, there were 34 cases of HPV positivity, with a positive rate of 23.61%. Cytologically, there were 37 cases diagnosed as no intraepithelial lesion or malignancy (NILM), and 107 cases diagnosed as atypical squamous cells of undetermined significance (ASC-US) or above. Histologically, there were 28 cases without cervical intraepithelial neoplasia or benign cervical lesions, 31 cases of CIN1, 18 cases of CIN2, 31 cases of CIN3, and 36 cases of squamous cell carcinoma. Seven hypermethylated gene fragments were selected from 45 genes, and three machine learning prediction models based on the RF, NB, and SVM algorithm, respectively, were constructed. In the validation set of 80 cases, there were 28 cases of HPV positivity, with a positive rate of 35.00%. Cytologically, there were 65 cases diagnosed as NILM and 15 cases as ASC-US or above. Histologically, there were 39 cases without cervical intraepithelial neoplasia or benign cervical lesions, 10 cases of CIN1, 10 cases of CIN2, 11 cases of CIN3, and 10 cases of squamous cell carcinoma. In the validation set, the area under the curve (AUC) values of the RF model, NB model, SVM model, HPV detection, and cytological diagnosis of CIN2 or above were 0.90, 0.88, 0.82, 0.68, and 0.45, respectively. The DeLong test showed that there was no statistically significant difference in the AUC values between the RF, NB, and SVM models (all P>0.05), and the AUC values of the RF and NB models were higher than that of HPV detection (both P<0.01), and the AUC values of the RF, NB, and SVM models were higher than that of cytological diagnosis (all P<0.01). Compared with the NB model, the sensitivity of the RF model was similar (80.65% vs. 77.42%), but the specificity of the NB model was much higher than that of the RF model (93.88% vs. 73.47%). Conclusion: Among the machine learning prediction models for cervical cancer and precancerous lesions constructed based on human DNA methylation, the NB model has good predictive performance for CIN2 and above lesions, and may be used for screening of cervical cancer and precancerous lesions.

求助全文
约1分钟内获得全文 求助全文
来源期刊
中华肿瘤杂志
中华肿瘤杂志 Medicine-Medicine (all)
CiteScore
1.40
自引率
0.00%
发文量
10433
期刊介绍:
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信