基于临床数据的机器学习聚类预测三阴性乳腺癌患者总生存期和无复发生存期

IF 2.5 3区医学 Q2 ONCOLOGY

Clinical breast cancer Pub Date : 2025-07-29 DOI:10.1016/j.clbc.2025.07.027

Juan Pablo Alzate-Granados , Luis Fernando Niño

{"title":"基于临床数据的机器学习聚类预测三阴性乳腺癌患者总生存期和无复发生存期","authors":"Juan Pablo Alzate-Granados , Luis Fernando Niño","doi":"10.1016/j.clbc.2025.07.027","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Triple-negative breast cancer (TNBC) accounts for 15% to 20% of breast cancer cases and is characterized by its aggressiveness and high relapse rate. Due to the absence of hormonal receptors and HER2, standard treatment relies on chemotherapy, yielding limited outcomes in overall survival (OS) and relapse-free survival (RFS). The molecular heterogeneity of TNBC complicates risk stratification and personalized treatment approaches. In this context, unsupervised machine learning could improve the identification of clinically homogeneous subgroups and facilitate prognostic predictions.</div></div><div><h3>Objective</h3><div>To develop predictive models for OS and RFS in TNBC patients using machine learning algorithms, specifically k-prototypes for subgroup identification and random forest for outcome prediction.</div></div><div><h3>Methods</h3><div>A retrospective cohort study was conducted on 4808 TNBC patients diagnosed between 2012 and 2024. Clinical, demographic, and biomolecular variables were analyzed from anonymized clinical records. The k-prototypes algorithm was applied to cluster patients into groups based on shared characteristics. Subsequently, predictive models using random forest were trained and evaluated through stratified cross-validation and metrics such as AUC, sensitivity, and specificity. Cox regression was used to identify risk factors associated with mortality and relapse.</div></div><div><h3>Results</h3><div>Four clusters with distinct risk profiles were identified. Overall mortality was 28.8%, and relapse occurred in 40.9%, with a median follow-up time of 8.46 years. The highest-risk group exhibited a mortality rate of 42.3% and a relapse rate of 54.2%, associated with poorer functional status (ECOG ≥3) and a high prevalence of BRCA1/2 mutations (71%). The random forest model achieved 80% accuracy in mortality prediction (AUC = 0.78) and 75% accuracy in relapse prediction (AUC = 0.76). Factors such as the Charlson Comorbidity Index, ECOG, BRCA1/2 status, and PD-L1 expression were key determinants in outcome prediction.</div></div><div><h3>Discussion</h3><div>The findings confirm the relevance of machine learning in TNBC stratification. A clinically meaningful classification was achieved, outperforming traditional models based solely on clinical or genomic variables. Comorbid burden and tumor biomarkers played crucial roles in outcome prediction. Despite its strengths, the study has limitations, including its retrospective nature and the absence of transcriptomic data. Prospective validation of these models could enhance their applicability in clinical practice.</div></div>","PeriodicalId":10197,"journal":{"name":"Clinical breast cancer","volume":"25 7","pages":"Pages 714-719"},"PeriodicalIF":2.5000,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prediction of Overall and Relapse-Free Survival in Triple-Negative Breast Cancer Patients Through Machine Learning-Based Clustering on Clinical Data\",\"authors\":\"Juan Pablo Alzate-Granados , Luis Fernando Niño\",\"doi\":\"10.1016/j.clbc.2025.07.027\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Introduction</h3><div>Triple-negative breast cancer (TNBC) accounts for 15% to 20% of breast cancer cases and is characterized by its aggressiveness and high relapse rate. Due to the absence of hormonal receptors and HER2, standard treatment relies on chemotherapy, yielding limited outcomes in overall survival (OS) and relapse-free survival (RFS). The molecular heterogeneity of TNBC complicates risk stratification and personalized treatment approaches. In this context, unsupervised machine learning could improve the identification of clinically homogeneous subgroups and facilitate prognostic predictions.</div></div><div><h3>Objective</h3><div>To develop predictive models for OS and RFS in TNBC patients using machine learning algorithms, specifically k-prototypes for subgroup identification and random forest for outcome prediction.</div></div><div><h3>Methods</h3><div>A retrospective cohort study was conducted on 4808 TNBC patients diagnosed between 2012 and 2024. Clinical, demographic, and biomolecular variables were analyzed from anonymized clinical records. The k-prototypes algorithm was applied to cluster patients into groups based on shared characteristics. Subsequently, predictive models using random forest were trained and evaluated through stratified cross-validation and metrics such as AUC, sensitivity, and specificity. Cox regression was used to identify risk factors associated with mortality and relapse.</div></div><div><h3>Results</h3><div>Four clusters with distinct risk profiles were identified. Overall mortality was 28.8%, and relapse occurred in 40.9%, with a median follow-up time of 8.46 years. The highest-risk group exhibited a mortality rate of 42.3% and a relapse rate of 54.2%, associated with poorer functional status (ECOG ≥3) and a high prevalence of BRCA1/2 mutations (71%). The random forest model achieved 80% accuracy in mortality prediction (AUC = 0.78) and 75% accuracy in relapse prediction (AUC = 0.76). Factors such as the Charlson Comorbidity Index, ECOG, BRCA1/2 status, and PD-L1 expression were key determinants in outcome prediction.</div></div><div><h3>Discussion</h3><div>The findings confirm the relevance of machine learning in TNBC stratification. A clinically meaningful classification was achieved, outperforming traditional models based solely on clinical or genomic variables. Comorbid burden and tumor biomarkers played crucial roles in outcome prediction. Despite its strengths, the study has limitations, including its retrospective nature and the absence of transcriptomic data. Prospective validation of these models could enhance their applicability in clinical practice.</div></div>\",\"PeriodicalId\":10197,\"journal\":{\"name\":\"Clinical breast cancer\",\"volume\":\"25 7\",\"pages\":\"Pages 714-719\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-07-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical breast cancer\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1526820925002241\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical breast cancer","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1526820925002241","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

简介：三阴性乳腺癌（triple negative breast cancer， TNBC）占乳腺癌病例的15% ~ 20%，具有侵袭性强、复发率高的特点。由于缺乏激素受体和HER2，标准治疗依赖于化疗，在总生存期（OS）和无复发生存期（RFS）方面的结果有限。TNBC的分子异质性使风险分层和个性化治疗方法复杂化。在这种情况下，无监督机器学习可以提高临床同质亚组的识别，并促进预后预测。目的：利用机器学习算法建立TNBC患者OS和RFS的预测模型，特别是用于亚组识别的k原型和用于结果预测的随机森林。方法：对2012 - 2024年间确诊的4808例TNBC患者进行回顾性队列研究。从匿名的临床记录中分析临床、人口统计学和生物分子变量。采用k-prototype算法根据患者共有特征进行分组。随后，使用随机森林的预测模型进行训练，并通过分层交叉验证和AUC、敏感性和特异性等指标进行评估。采用Cox回归分析确定与死亡率和复发相关的危险因素。结果：确定了四个具有不同风险概况的集群。总死亡率28.8%，复发率40.9%，中位随访时间8.46年。最高风险组的死亡率为42.3%，复发率为54.2%，与较差的功能状态（ECOG≥3）和BRCA1/2突变的高患病率（71%）相关。随机森林模型预测死亡率的准确率为80% (AUC = 0.78)，预测复发的准确率为75% （AUC = 0.76）。Charlson合并症指数、ECOG、BRCA1/2状态和PD-L1表达等因素是预测预后的关键决定因素。讨论：研究结果证实了机器学习在TNBC分层中的相关性。实现了具有临床意义的分类，优于仅基于临床或基因组变量的传统模型。共病负担和肿瘤生物标志物在预后预测中起着至关重要的作用。尽管有其优势，但该研究也有局限性，包括其回顾性和缺乏转录组学数据。对这些模型进行前瞻性验证可以提高其在临床实践中的适用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Prediction of Overall and Relapse-Free Survival in Triple-Negative Breast Cancer Patients Through Machine Learning-Based Clustering on Clinical Data

Introduction

Triple-negative breast cancer (TNBC) accounts for 15% to 20% of breast cancer cases and is characterized by its aggressiveness and high relapse rate. Due to the absence of hormonal receptors and HER2, standard treatment relies on chemotherapy, yielding limited outcomes in overall survival (OS) and relapse-free survival (RFS). The molecular heterogeneity of TNBC complicates risk stratification and personalized treatment approaches. In this context, unsupervised machine learning could improve the identification of clinically homogeneous subgroups and facilitate prognostic predictions.

Objective

To develop predictive models for OS and RFS in TNBC patients using machine learning algorithms, specifically k-prototypes for subgroup identification and random forest for outcome prediction.

Methods

A retrospective cohort study was conducted on 4808 TNBC patients diagnosed between 2012 and 2024. Clinical, demographic, and biomolecular variables were analyzed from anonymized clinical records. The k-prototypes algorithm was applied to cluster patients into groups based on shared characteristics. Subsequently, predictive models using random forest were trained and evaluated through stratified cross-validation and metrics such as AUC, sensitivity, and specificity. Cox regression was used to identify risk factors associated with mortality and relapse.

Results

Four clusters with distinct risk profiles were identified. Overall mortality was 28.8%, and relapse occurred in 40.9%, with a median follow-up time of 8.46 years. The highest-risk group exhibited a mortality rate of 42.3% and a relapse rate of 54.2%, associated with poorer functional status (ECOG ≥3) and a high prevalence of BRCA1/2 mutations (71%). The random forest model achieved 80% accuracy in mortality prediction (AUC = 0.78) and 75% accuracy in relapse prediction (AUC = 0.76). Factors such as the Charlson Comorbidity Index, ECOG, BRCA1/2 status, and PD-L1 expression were key determinants in outcome prediction.

Discussion

The findings confirm the relevance of machine learning in TNBC stratification. A clinically meaningful classification was achieved, outperforming traditional models based solely on clinical or genomic variables. Comorbid burden and tumor biomarkers played crucial roles in outcome prediction. Despite its strengths, the study has limitations, including its retrospective nature and the absence of transcriptomic data. Prospective validation of these models could enhance their applicability in clinical practice.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Clinical breast cancer 医学-肿瘤学

CiteScore

5.40

自引率

3.20%

发文量

174

审稿时长

48 days

期刊介绍： Clinical Breast Cancer is a peer-reviewed bimonthly journal that publishes original articles describing various aspects of clinical and translational research of breast cancer. Clinical Breast Cancer is devoted to articles on detection, diagnosis, prevention, and treatment of breast cancer. The main emphasis is on recent scientific developments in all areas related to breast cancer. Specific areas of interest include clinical research reports from various therapeutic modalities, cancer genetics, drug sensitivity and resistance, novel imaging, tumor genomics, biomarkers, and chemoprevention strategies.