{"title":"基于临床数据的机器学习聚类预测三阴性乳腺癌患者总生存期和无复发生存期","authors":"Juan Pablo Alzate-Granados , Luis Fernando Niño","doi":"10.1016/j.clbc.2025.07.027","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Triple-negative breast cancer (TNBC) accounts for 15% to 20% of breast cancer cases and is characterized by its aggressiveness and high relapse rate. Due to the absence of hormonal receptors and HER2, standard treatment relies on chemotherapy, yielding limited outcomes in overall survival (OS) and relapse-free survival (RFS). The molecular heterogeneity of TNBC complicates risk stratification and personalized treatment approaches. In this context, unsupervised machine learning could improve the identification of clinically homogeneous subgroups and facilitate prognostic predictions.</div></div><div><h3>Objective</h3><div>To develop predictive models for OS and RFS in TNBC patients using machine learning algorithms, specifically k-prototypes for subgroup identification and random forest for outcome prediction.</div></div><div><h3>Methods</h3><div>A retrospective cohort study was conducted on 4808 TNBC patients diagnosed between 2012 and 2024. Clinical, demographic, and biomolecular variables were analyzed from anonymized clinical records. The k-prototypes algorithm was applied to cluster patients into groups based on shared characteristics. Subsequently, predictive models using random forest were trained and evaluated through stratified cross-validation and metrics such as AUC, sensitivity, and specificity. Cox regression was used to identify risk factors associated with mortality and relapse.</div></div><div><h3>Results</h3><div>Four clusters with distinct risk profiles were identified. Overall mortality was 28.8%, and relapse occurred in 40.9%, with a median follow-up time of 8.46 years. The highest-risk group exhibited a mortality rate of 42.3% and a relapse rate of 54.2%, associated with poorer functional status (ECOG ≥3) and a high prevalence of BRCA1/2 mutations (71%). The random forest model achieved 80% accuracy in mortality prediction (AUC = 0.78) and 75% accuracy in relapse prediction (AUC = 0.76). Factors such as the Charlson Comorbidity Index, ECOG, BRCA1/2 status, and PD-L1 expression were key determinants in outcome prediction.</div></div><div><h3>Discussion</h3><div>The findings confirm the relevance of machine learning in TNBC stratification. A clinically meaningful classification was achieved, outperforming traditional models based solely on clinical or genomic variables. Comorbid burden and tumor biomarkers played crucial roles in outcome prediction. Despite its strengths, the study has limitations, including its retrospective nature and the absence of transcriptomic data. Prospective validation of these models could enhance their applicability in clinical practice.</div></div>","PeriodicalId":10197,"journal":{"name":"Clinical breast cancer","volume":"25 7","pages":"Pages 714-719"},"PeriodicalIF":2.5000,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prediction of Overall and Relapse-Free Survival in Triple-Negative Breast Cancer Patients Through Machine Learning-Based Clustering on Clinical Data\",\"authors\":\"Juan Pablo Alzate-Granados , Luis Fernando Niño\",\"doi\":\"10.1016/j.clbc.2025.07.027\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Introduction</h3><div>Triple-negative breast cancer (TNBC) accounts for 15% to 20% of breast cancer cases and is characterized by its aggressiveness and high relapse rate. Due to the absence of hormonal receptors and HER2, standard treatment relies on chemotherapy, yielding limited outcomes in overall survival (OS) and relapse-free survival (RFS). The molecular heterogeneity of TNBC complicates risk stratification and personalized treatment approaches. In this context, unsupervised machine learning could improve the identification of clinically homogeneous subgroups and facilitate prognostic predictions.</div></div><div><h3>Objective</h3><div>To develop predictive models for OS and RFS in TNBC patients using machine learning algorithms, specifically k-prototypes for subgroup identification and random forest for outcome prediction.</div></div><div><h3>Methods</h3><div>A retrospective cohort study was conducted on 4808 TNBC patients diagnosed between 2012 and 2024. Clinical, demographic, and biomolecular variables were analyzed from anonymized clinical records. The k-prototypes algorithm was applied to cluster patients into groups based on shared characteristics. Subsequently, predictive models using random forest were trained and evaluated through stratified cross-validation and metrics such as AUC, sensitivity, and specificity. Cox regression was used to identify risk factors associated with mortality and relapse.</div></div><div><h3>Results</h3><div>Four clusters with distinct risk profiles were identified. Overall mortality was 28.8%, and relapse occurred in 40.9%, with a median follow-up time of 8.46 years. The highest-risk group exhibited a mortality rate of 42.3% and a relapse rate of 54.2%, associated with poorer functional status (ECOG ≥3) and a high prevalence of BRCA1/2 mutations (71%). The random forest model achieved 80% accuracy in mortality prediction (AUC = 0.78) and 75% accuracy in relapse prediction (AUC = 0.76). Factors such as the Charlson Comorbidity Index, ECOG, BRCA1/2 status, and PD-L1 expression were key determinants in outcome prediction.</div></div><div><h3>Discussion</h3><div>The findings confirm the relevance of machine learning in TNBC stratification. A clinically meaningful classification was achieved, outperforming traditional models based solely on clinical or genomic variables. Comorbid burden and tumor biomarkers played crucial roles in outcome prediction. Despite its strengths, the study has limitations, including its retrospective nature and the absence of transcriptomic data. Prospective validation of these models could enhance their applicability in clinical practice.</div></div>\",\"PeriodicalId\":10197,\"journal\":{\"name\":\"Clinical breast cancer\",\"volume\":\"25 7\",\"pages\":\"Pages 714-719\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-07-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical breast cancer\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1526820925002241\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical breast cancer","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1526820925002241","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
Prediction of Overall and Relapse-Free Survival in Triple-Negative Breast Cancer Patients Through Machine Learning-Based Clustering on Clinical Data
Introduction
Triple-negative breast cancer (TNBC) accounts for 15% to 20% of breast cancer cases and is characterized by its aggressiveness and high relapse rate. Due to the absence of hormonal receptors and HER2, standard treatment relies on chemotherapy, yielding limited outcomes in overall survival (OS) and relapse-free survival (RFS). The molecular heterogeneity of TNBC complicates risk stratification and personalized treatment approaches. In this context, unsupervised machine learning could improve the identification of clinically homogeneous subgroups and facilitate prognostic predictions.
Objective
To develop predictive models for OS and RFS in TNBC patients using machine learning algorithms, specifically k-prototypes for subgroup identification and random forest for outcome prediction.
Methods
A retrospective cohort study was conducted on 4808 TNBC patients diagnosed between 2012 and 2024. Clinical, demographic, and biomolecular variables were analyzed from anonymized clinical records. The k-prototypes algorithm was applied to cluster patients into groups based on shared characteristics. Subsequently, predictive models using random forest were trained and evaluated through stratified cross-validation and metrics such as AUC, sensitivity, and specificity. Cox regression was used to identify risk factors associated with mortality and relapse.
Results
Four clusters with distinct risk profiles were identified. Overall mortality was 28.8%, and relapse occurred in 40.9%, with a median follow-up time of 8.46 years. The highest-risk group exhibited a mortality rate of 42.3% and a relapse rate of 54.2%, associated with poorer functional status (ECOG ≥3) and a high prevalence of BRCA1/2 mutations (71%). The random forest model achieved 80% accuracy in mortality prediction (AUC = 0.78) and 75% accuracy in relapse prediction (AUC = 0.76). Factors such as the Charlson Comorbidity Index, ECOG, BRCA1/2 status, and PD-L1 expression were key determinants in outcome prediction.
Discussion
The findings confirm the relevance of machine learning in TNBC stratification. A clinically meaningful classification was achieved, outperforming traditional models based solely on clinical or genomic variables. Comorbid burden and tumor biomarkers played crucial roles in outcome prediction. Despite its strengths, the study has limitations, including its retrospective nature and the absence of transcriptomic data. Prospective validation of these models could enhance their applicability in clinical practice.
期刊介绍:
Clinical Breast Cancer is a peer-reviewed bimonthly journal that publishes original articles describing various aspects of clinical and translational research of breast cancer. Clinical Breast Cancer is devoted to articles on detection, diagnosis, prevention, and treatment of breast cancer. The main emphasis is on recent scientific developments in all areas related to breast cancer. Specific areas of interest include clinical research reports from various therapeutic modalities, cancer genetics, drug sensitivity and resistance, novel imaging, tumor genomics, biomarkers, and chemoprevention strategies.