Exploring Ovarian Cancer Prediction Models and Potential Markers Using Machine Learning.

IF 1.1 4区 医学 Q4 MEDICAL LABORATORY TECHNOLOGY
Huijing Luo, Xiaofang Zhang, Dongsha Shi, Yanv Ren, Wenyan Tian, Ruiyu Ma, Zuoliang Dong
{"title":"Exploring Ovarian Cancer Prediction Models and Potential Markers Using Machine Learning.","authors":"Huijing Luo, Xiaofang Zhang, Dongsha Shi, Yanv Ren, Wenyan Tian, Ruiyu Ma, Zuoliang Dong","doi":"","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To develop machine learning models, facilitate a more accurate diagnosis of ovarian cancer (OC), and explore potential markers.</p><p><strong>Methods: </strong>Overall, 311 patients diagnosed with OC, 56 with borderline ovarian tumors (OTs), and 368 patients with benign OTs were defined as derivation cohort and randomly divided into training (70%) and internal validation (30%) sets. An independent external validation cohort was also established. A total of 34 variables including patients' demographic characteristics and laboratory test results were collected. Models were developed using artificial neural network, support vector machine, random forest, and extreme gradient boosting (XGBoost).</p><p><strong>Results: </strong>All four models achieved high accuracy, with XGBoost achieving the highest area under the curve (AUC). When using the XGBoost model to differentiate OC from borderline and benign OTs, the AUC and 95% confidence interval, sensitivity, specificity, positive predictive value, and negative predictive value of the training set were 0.973 (0.962-0.985), 84.2%, 96.6%, 93.9%, and 90.6%, respectively. For the internal validation set, the values were 0.932 (0.897-0.966), 74.7%, 92.0%, 85.5%, and 85.2%. The eight most important variables were human epididymis protein 4, carbohydrate antigen 125, lactate dehydrogenase, D-dimer, age, testosterone, follicle-stimulating hormone, and hemoglobin. Subgroup analyses also revealed that this model exhibited outstanding performance in identifying early-stage OC and epithelial OC.</p><p><strong>Conclusion: </strong>Machine learning models demonstrate excellent accuracy in distinguishing OC from borderline and benign OTs, with several potential markers being validated.</p>","PeriodicalId":8228,"journal":{"name":"Annals of clinical and laboratory science","volume":"55 2","pages":"153-165"},"PeriodicalIF":1.1000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of clinical and laboratory science","FirstCategoryId":"3","ListUrlMain":"","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MEDICAL LABORATORY TECHNOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: To develop machine learning models, facilitate a more accurate diagnosis of ovarian cancer (OC), and explore potential markers.

Methods: Overall, 311 patients diagnosed with OC, 56 with borderline ovarian tumors (OTs), and 368 patients with benign OTs were defined as derivation cohort and randomly divided into training (70%) and internal validation (30%) sets. An independent external validation cohort was also established. A total of 34 variables including patients' demographic characteristics and laboratory test results were collected. Models were developed using artificial neural network, support vector machine, random forest, and extreme gradient boosting (XGBoost).

Results: All four models achieved high accuracy, with XGBoost achieving the highest area under the curve (AUC). When using the XGBoost model to differentiate OC from borderline and benign OTs, the AUC and 95% confidence interval, sensitivity, specificity, positive predictive value, and negative predictive value of the training set were 0.973 (0.962-0.985), 84.2%, 96.6%, 93.9%, and 90.6%, respectively. For the internal validation set, the values were 0.932 (0.897-0.966), 74.7%, 92.0%, 85.5%, and 85.2%. The eight most important variables were human epididymis protein 4, carbohydrate antigen 125, lactate dehydrogenase, D-dimer, age, testosterone, follicle-stimulating hormone, and hemoglobin. Subgroup analyses also revealed that this model exhibited outstanding performance in identifying early-stage OC and epithelial OC.

Conclusion: Machine learning models demonstrate excellent accuracy in distinguishing OC from borderline and benign OTs, with several potential markers being validated.

利用机器学习探索卵巢癌预测模型和潜在标记物。
目的:建立机器学习模型,促进卵巢癌(OC)的更准确诊断,并探索潜在的标志物。方法:将311例卵巢癌患者、56例交界性卵巢肿瘤患者和368例良性卵巢肿瘤患者定义为衍生队列,随机分为训练组(70%)和内部验证组(30%)。还建立了一个独立的外部验证队列。总共收集了34个变量,包括患者的人口学特征和实验室检测结果。采用人工神经网络、支持向量机、随机森林和极端梯度增强(XGBoost)技术建立模型。结果:4种模型均具有较高的准确度,其中XGBoost的曲线下面积(AUC)最高。使用XGBoost模型区分OC与交界性和良性OC时,训练集的AUC和95%置信区间、敏感性、特异性、阳性预测值和阴性预测值分别为0.973(0.962 ~ 0.985)、84.2%、96.6%、93.9%和90.6%。对于内部验证集,其值分别为0.932(0.897-0.966)、74.7%、92.0%、85.5%和85.2%。8个最重要的变量是人附睾蛋白4、碳水化合物抗原125、乳酸脱氢酶、d -二聚体、年龄、睾酮、促卵泡激素和血红蛋白。亚组分析还显示,该模型在识别早期OC和上皮OC方面表现出色。结论:机器学习模型在区分OC、交界性OC和良性OC方面具有出色的准确性,并验证了几个潜在的标记物。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Annals of clinical and laboratory science
Annals of clinical and laboratory science 医学-医学实验技术
CiteScore
1.60
自引率
0.00%
发文量
112
审稿时长
6-12 weeks
期刊介绍: The Annals of Clinical & Laboratory Science welcomes manuscripts that report research in clinical science, including pathology, clinical chemistry, biotechnology, molecular biology, cytogenetics, microbiology, immunology, hematology, transfusion medicine, organ and tissue transplantation, therapeutics, toxicology, and clinical informatics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信