{"title":"Exploring Ovarian Cancer Prediction Models and Potential Markers Using Machine Learning.","authors":"Huijing Luo, Xiaofang Zhang, Dongsha Shi, Yanv Ren, Wenyan Tian, Ruiyu Ma, Zuoliang Dong","doi":"","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To develop machine learning models, facilitate a more accurate diagnosis of ovarian cancer (OC), and explore potential markers.</p><p><strong>Methods: </strong>Overall, 311 patients diagnosed with OC, 56 with borderline ovarian tumors (OTs), and 368 patients with benign OTs were defined as derivation cohort and randomly divided into training (70%) and internal validation (30%) sets. An independent external validation cohort was also established. A total of 34 variables including patients' demographic characteristics and laboratory test results were collected. Models were developed using artificial neural network, support vector machine, random forest, and extreme gradient boosting (XGBoost).</p><p><strong>Results: </strong>All four models achieved high accuracy, with XGBoost achieving the highest area under the curve (AUC). When using the XGBoost model to differentiate OC from borderline and benign OTs, the AUC and 95% confidence interval, sensitivity, specificity, positive predictive value, and negative predictive value of the training set were 0.973 (0.962-0.985), 84.2%, 96.6%, 93.9%, and 90.6%, respectively. For the internal validation set, the values were 0.932 (0.897-0.966), 74.7%, 92.0%, 85.5%, and 85.2%. The eight most important variables were human epididymis protein 4, carbohydrate antigen 125, lactate dehydrogenase, D-dimer, age, testosterone, follicle-stimulating hormone, and hemoglobin. Subgroup analyses also revealed that this model exhibited outstanding performance in identifying early-stage OC and epithelial OC.</p><p><strong>Conclusion: </strong>Machine learning models demonstrate excellent accuracy in distinguishing OC from borderline and benign OTs, with several potential markers being validated.</p>","PeriodicalId":8228,"journal":{"name":"Annals of clinical and laboratory science","volume":"55 2","pages":"153-165"},"PeriodicalIF":1.1000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of clinical and laboratory science","FirstCategoryId":"3","ListUrlMain":"","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MEDICAL LABORATORY TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: To develop machine learning models, facilitate a more accurate diagnosis of ovarian cancer (OC), and explore potential markers.
Methods: Overall, 311 patients diagnosed with OC, 56 with borderline ovarian tumors (OTs), and 368 patients with benign OTs were defined as derivation cohort and randomly divided into training (70%) and internal validation (30%) sets. An independent external validation cohort was also established. A total of 34 variables including patients' demographic characteristics and laboratory test results were collected. Models were developed using artificial neural network, support vector machine, random forest, and extreme gradient boosting (XGBoost).
Results: All four models achieved high accuracy, with XGBoost achieving the highest area under the curve (AUC). When using the XGBoost model to differentiate OC from borderline and benign OTs, the AUC and 95% confidence interval, sensitivity, specificity, positive predictive value, and negative predictive value of the training set were 0.973 (0.962-0.985), 84.2%, 96.6%, 93.9%, and 90.6%, respectively. For the internal validation set, the values were 0.932 (0.897-0.966), 74.7%, 92.0%, 85.5%, and 85.2%. The eight most important variables were human epididymis protein 4, carbohydrate antigen 125, lactate dehydrogenase, D-dimer, age, testosterone, follicle-stimulating hormone, and hemoglobin. Subgroup analyses also revealed that this model exhibited outstanding performance in identifying early-stage OC and epithelial OC.
Conclusion: Machine learning models demonstrate excellent accuracy in distinguishing OC from borderline and benign OTs, with several potential markers being validated.
期刊介绍:
The Annals of Clinical & Laboratory Science
welcomes manuscripts that report research in clinical
science, including pathology, clinical chemistry,
biotechnology, molecular biology, cytogenetics,
microbiology, immunology, hematology, transfusion
medicine, organ and tissue transplantation, therapeutics, toxicology, and clinical informatics.