Construction and validation of machine learning models for predicting lymph node metastasis in cutaneous malignant melanoma: a large population-based study.
Ling-Feng Lan, Yi-Long Kai, Xiao-Ling Xu, Jun-Kun Zhang, Guang-Bo Xu, Yan-Bi Dai, Yan Shen, Hua-Ya Lu, Ben Wang
{"title":"Construction and validation of machine learning models for predicting lymph node metastasis in cutaneous malignant melanoma: a large population-based study.","authors":"Ling-Feng Lan, Yi-Long Kai, Xiao-Ling Xu, Jun-Kun Zhang, Guang-Bo Xu, Yan-Bi Dai, Yan Shen, Hua-Ya Lu, Ben Wang","doi":"10.21037/tcr-24-1672","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Lymph node status is essential for determining the prognosis of cutaneous malignant melanoma (CMM). This study aimed to develop a machine learning (ML) model for predicting lymph node metastases (LNM) in CMM.</p><p><strong>Methods: </strong>We gathered data on 6,196 patients from the Surveillance, Epidemiology, and End Results (SEER) database, including known clinicopathologic variables, using six ML algorithms, including logistic regression (LR), support vector machine (SVM), Complement Naive Bayes (CNB), Extreme Gradient Boosting (XGBoost), RandomForest (RF), and k-nearest neighbor algorithm (kNN), to predict the presence of LNM in CMM. Subsequently, we established prediction models. The utilization of the adaptive synthetic (ADASYN) method served to address the challenge posed by imbalanced data. We assessed prediction model performance in terms of average precision (AP), sensitivity, specificity, accuracy, F1 score, precision-recall curves, calibration plots, and decision curve analysis (DCA). Furthermore, employing SHapley Additive exPlanation (SHAP) analysis resulted in the creation of visualized explanations tailored to individual patients.</p><p><strong>Results: </strong>Among the 6,196 CMM cases, 19.9% (n=1,234) presented with LNM. The XGBoost model showed the best predictive performance when compared with the other algorithms (AP of 0.805). XGBoost showed that age and Breslow thickness were the two most important factors related to LNM.</p><p><strong>Conclusions: </strong>The XGBoost model predicted LNM of CMM with a high level of precision. We hope that this model could assist surgeons in accurately evaluating surgical approaches and determining the extent of surgery, while also guiding the subsequent adjuvant therapies, thereby improving the prognosis of patients.</p>","PeriodicalId":23216,"journal":{"name":"Translational cancer research","volume":"14 2","pages":"706-716"},"PeriodicalIF":1.5000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11912072/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Translational cancer research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21037/tcr-24-1672","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/18 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Lymph node status is essential for determining the prognosis of cutaneous malignant melanoma (CMM). This study aimed to develop a machine learning (ML) model for predicting lymph node metastases (LNM) in CMM.
Methods: We gathered data on 6,196 patients from the Surveillance, Epidemiology, and End Results (SEER) database, including known clinicopathologic variables, using six ML algorithms, including logistic regression (LR), support vector machine (SVM), Complement Naive Bayes (CNB), Extreme Gradient Boosting (XGBoost), RandomForest (RF), and k-nearest neighbor algorithm (kNN), to predict the presence of LNM in CMM. Subsequently, we established prediction models. The utilization of the adaptive synthetic (ADASYN) method served to address the challenge posed by imbalanced data. We assessed prediction model performance in terms of average precision (AP), sensitivity, specificity, accuracy, F1 score, precision-recall curves, calibration plots, and decision curve analysis (DCA). Furthermore, employing SHapley Additive exPlanation (SHAP) analysis resulted in the creation of visualized explanations tailored to individual patients.
Results: Among the 6,196 CMM cases, 19.9% (n=1,234) presented with LNM. The XGBoost model showed the best predictive performance when compared with the other algorithms (AP of 0.805). XGBoost showed that age and Breslow thickness were the two most important factors related to LNM.
Conclusions: The XGBoost model predicted LNM of CMM with a high level of precision. We hope that this model could assist surgeons in accurately evaluating surgical approaches and determining the extent of surgery, while also guiding the subsequent adjuvant therapies, thereby improving the prognosis of patients.
期刊介绍:
Translational Cancer Research (Transl Cancer Res TCR; Print ISSN: 2218-676X; Online ISSN 2219-6803; http://tcr.amegroups.com/) is an Open Access, peer-reviewed journal, indexed in Science Citation Index Expanded (SCIE). TCR publishes laboratory studies of novel therapeutic interventions as well as clinical trials which evaluate new treatment paradigms for cancer; results of novel research investigations which bridge the laboratory and clinical settings including risk assessment, cellular and molecular characterization, prevention, detection, diagnosis and treatment of human cancers with the overall goal of improving the clinical care of cancer patients. The focus of TCR is original, peer-reviewed, science-based research that successfully advances clinical medicine toward the goal of improving patients'' quality of life. The editors and an international advisory group of scientists and clinician-scientists as well as other experts will hold TCR articles to the high-quality standards. We accept Original Articles as well as Review Articles, Editorials and Brief Articles.