Machine learning prediction of pathological complete response and overall survival of breast cancer patients in an underserved inner-city population

IF 6.1 1区医学 Q1 ONCOLOGY

Breast Cancer Research Pub Date : 2024-01-10 DOI:10.1186/s13058-023-01762-w

Kevin Dell’Aquila, Abhinav Vadlamani, Takouhie Maldjian, Susan Fineberg, Anna Eligulashvili, Julie Chung, Richard Adam, Laura Hodges, Wei Hou, Della Makower, Tim Q. Duong

{"title":"Machine learning prediction of pathological complete response and overall survival of breast cancer patients in an underserved inner-city population","authors":"Kevin Dell’Aquila, Abhinav Vadlamani, Takouhie Maldjian, Susan Fineberg, Anna Eligulashvili, Julie Chung, Richard Adam, Laura Hodges, Wei Hou, Della Makower, Tim Q. Duong","doi":"10.1186/s13058-023-01762-w","DOIUrl":null,"url":null,"abstract":"Generalizability of predictive models for pathological complete response (pCR) and overall survival (OS) in breast cancer patients requires diverse datasets. This study employed four machine learning models to predict pCR and OS up to 7.5 years using data from a diverse and underserved inner-city population. Demographics, staging, tumor subtypes, income, insurance status, and data from radiology reports were obtained from 475 breast cancer patients on neoadjuvant chemotherapy in an inner-city health system (01/01/2012 to 12/31/2021). Logistic regression, Neural Network, Random Forest, and Gradient Boosted Regression models were used to predict outcomes (pCR and OS) with fivefold cross validation. pCR was not associated with age, race, ethnicity, tumor staging, Nottingham grade, income, and insurance status (p > 0.05). ER−/HER2+ showed the highest pCR rate, followed by triple negative, ER+/HER2+, and ER+/HER2− (all p < 0.05), tumor size (p < 0.003) and background parenchymal enhancement (BPE) (p < 0.01). Machine learning models ranked ER+/HER2−, ER−/HER2+, tumor size, and BPE as top predictors of pCR (AUC = 0.74–0.76). OS was associated with race, pCR status, tumor subtype, and insurance status (p < 0.05), but not ethnicity and incomes (p > 0.05). Machine learning models ranked tumor stage, pCR, nodal stage, and triple-negative subtype as top predictors of OS (AUC = 0.83–0.85). When grouping race and ethnicity by tumor subtypes, neither OS nor pCR were different due to race and ethnicity for each tumor subtype (p > 0.05). Tumor subtypes and imaging characteristics were top predictors of pCR in our inner-city population. Insurance status, race, tumor subtypes and pCR were associated with OS. Machine learning models accurately predicted pCR and OS.","PeriodicalId":9222,"journal":{"name":"Breast Cancer Research","volume":"37 1","pages":""},"PeriodicalIF":6.1000,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Breast Cancer Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13058-023-01762-w","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ONCOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Generalizability of predictive models for pathological complete response (pCR) and overall survival (OS) in breast cancer patients requires diverse datasets. This study employed four machine learning models to predict pCR and OS up to 7.5 years using data from a diverse and underserved inner-city population. Demographics, staging, tumor subtypes, income, insurance status, and data from radiology reports were obtained from 475 breast cancer patients on neoadjuvant chemotherapy in an inner-city health system (01/01/2012 to 12/31/2021). Logistic regression, Neural Network, Random Forest, and Gradient Boosted Regression models were used to predict outcomes (pCR and OS) with fivefold cross validation. pCR was not associated with age, race, ethnicity, tumor staging, Nottingham grade, income, and insurance status (p > 0.05). ER−/HER2+ showed the highest pCR rate, followed by triple negative, ER+/HER2+, and ER+/HER2− (all p < 0.05), tumor size (p < 0.003) and background parenchymal enhancement (BPE) (p < 0.01). Machine learning models ranked ER+/HER2−, ER−/HER2+, tumor size, and BPE as top predictors of pCR (AUC = 0.74–0.76). OS was associated with race, pCR status, tumor subtype, and insurance status (p < 0.05), but not ethnicity and incomes (p > 0.05). Machine learning models ranked tumor stage, pCR, nodal stage, and triple-negative subtype as top predictors of OS (AUC = 0.83–0.85). When grouping race and ethnicity by tumor subtypes, neither OS nor pCR were different due to race and ethnicity for each tumor subtype (p > 0.05). Tumor subtypes and imaging characteristics were top predictors of pCR in our inner-city population. Insurance status, race, tumor subtypes and pCR were associated with OS. Machine learning models accurately predicted pCR and OS.

查看原文本刊更多论文

机器学习预测服务不足的市内人群中乳腺癌患者的病理完全反应和总生存率

乳腺癌患者病理完全反应（pCR）和总生存期（OS）预测模型的通用性需要不同的数据集。本研究采用了四种机器学习模型，利用来自服务不足的多样化市内人群的数据，预测长达 7.5 年的病理完全反应和总生存期。研究人员从市内医疗系统接受新辅助化疗的 475 名乳腺癌患者（2012 年 1 月 1 日至 2021 年 12 月 31 日）处获得了人口统计学、分期、肿瘤亚型、收入、保险状况和放射学报告数据。采用逻辑回归、神经网络、随机森林和梯度提升回归模型预测结果（pCR 和 OS），并进行五倍交叉验证。pCR 与年龄、种族、民族、肿瘤分期、诺丁汉分级、收入和保险状况无关（p > 0.05）。ER-/HER2+的pCR率最高，其次是三阴性、ER+/HER2+和ER+/HER2-（均为p 0.05）。机器学习模型将肿瘤分期、pCR、结节分期和三阴性亚型列为预测OS的首要指标（AUC = 0.83-0.85）。当按肿瘤亚型对种族和民族进行分组时，每种肿瘤亚型的OS和pCR均不因种族和民族而异（P > 0.05）。肿瘤亚型和影像学特征是市内人群 pCR 的首要预测因素。保险状况、种族、肿瘤亚型和pCR与OS相关。机器学习模型能准确预测pCR和OS。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Breast Cancer Research 医学-肿瘤学

自引率

0.00%

发文量

期刊介绍： Breast Cancer Research is an international, peer-reviewed online journal, publishing original research, reviews, editorials and reports. Open access research articles of exceptional interest are published in all areas of biology and medicine relevant to breast cancer, including normal mammary gland biology, with special emphasis on the genetic, biochemical, and cellular basis of breast cancer. In addition to basic research, the journal publishes preclinical, translational and clinical studies with a biological basis, including Phase I and Phase II trials.