Prediction of Breast Cancer Remission.

IF 1.2 4区 医学 Q4 HEALTH CARE SCIENCES & SERVICES
Vladimir Cardenas, Yalin Li, Samika Shrestha, Hong Xue
{"title":"Prediction of Breast Cancer Remission.","authors":"Vladimir Cardenas, Yalin Li, Samika Shrestha, Hong Xue","doi":"10.1097/QMH.0000000000000513","DOIUrl":null,"url":null,"abstract":"<p><strong>Background and objectives: </strong>This study aims to use electronic health records (EHR) and social determinants of health (SDOH) data to predict breast cancer remission. The emphasis is placed on utilizing easily accessible information to improve predictive models, facilitate the early detection of high-risk patients, and facilitate targeted interventions and personalized care strategies.</p><p><strong>Methods: </strong>This study identifies individuals who are unlikely to respond to standard treatment of breast cancer. The study identified 1621 patients with breast cancer by selecting patients who received tamoxifen in the All of Us Research Database. The dependent variable, remission, was defined using tamoxifen exposure as a proxy. Data preprocessing involved creating dummy variables for diseases, demographic, and socioeconomic factors and handling missing values to maintain data integrity. For the feature selection phase, we utilized the strong rule for feature elimination and then logistic least absolute shrinkage and selection operator regression with 5-fold cross-validation to reduce the number of predictors by retaining only those with coefficients with an absolute value greater than 0.01. We then trained machine learning models using logistic regression, random forest, naïve Bayes, and extreme gradient boost using area under the receiver operating curve (AUROC) metric to score model performance. This created race-neutral model performance. Finally, we analyzed model performance for race and ethnicity test populations including Non-Hispanic White, Non-Hispanic Black, Hispanic, and Other Race or Ethnicity. These generated race-specific model performance.</p><p><strong>Results: </strong>The model achieved an AUROC range between 0.68 and 0.75, with logistic regression and random forest trained on data without interaction terms demonstrating the best performance. Feature selection identified significant factors such as melanocytic nevus and bone disorders, highlighting the importance of these factors in predictive accuracy. Race-specific model performance was lower than race-neutral model performance for Non-Hispanic Blacks, and Other Race and Ethnicity Groups.</p><p><strong>Conclusions: </strong>In conclusion, our research demonstrates the feasibility of predicting breast cancer non-remission using EHR and SDOH data, achieving acceptable performance without complex predictors. Addressing the data quality limitations and refining remission indicators can further improve the models' utility for early treatment decisions, fostering improved patient outcomes and support throughout the cancer journey.</p>","PeriodicalId":20986,"journal":{"name":"Quality Management in Health Care","volume":" ","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Quality Management in Health Care","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/QMH.0000000000000513","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Background and objectives: This study aims to use electronic health records (EHR) and social determinants of health (SDOH) data to predict breast cancer remission. The emphasis is placed on utilizing easily accessible information to improve predictive models, facilitate the early detection of high-risk patients, and facilitate targeted interventions and personalized care strategies.

Methods: This study identifies individuals who are unlikely to respond to standard treatment of breast cancer. The study identified 1621 patients with breast cancer by selecting patients who received tamoxifen in the All of Us Research Database. The dependent variable, remission, was defined using tamoxifen exposure as a proxy. Data preprocessing involved creating dummy variables for diseases, demographic, and socioeconomic factors and handling missing values to maintain data integrity. For the feature selection phase, we utilized the strong rule for feature elimination and then logistic least absolute shrinkage and selection operator regression with 5-fold cross-validation to reduce the number of predictors by retaining only those with coefficients with an absolute value greater than 0.01. We then trained machine learning models using logistic regression, random forest, naïve Bayes, and extreme gradient boost using area under the receiver operating curve (AUROC) metric to score model performance. This created race-neutral model performance. Finally, we analyzed model performance for race and ethnicity test populations including Non-Hispanic White, Non-Hispanic Black, Hispanic, and Other Race or Ethnicity. These generated race-specific model performance.

Results: The model achieved an AUROC range between 0.68 and 0.75, with logistic regression and random forest trained on data without interaction terms demonstrating the best performance. Feature selection identified significant factors such as melanocytic nevus and bone disorders, highlighting the importance of these factors in predictive accuracy. Race-specific model performance was lower than race-neutral model performance for Non-Hispanic Blacks, and Other Race and Ethnicity Groups.

Conclusions: In conclusion, our research demonstrates the feasibility of predicting breast cancer non-remission using EHR and SDOH data, achieving acceptable performance without complex predictors. Addressing the data quality limitations and refining remission indicators can further improve the models' utility for early treatment decisions, fostering improved patient outcomes and support throughout the cancer journey.

求助全文
约1分钟内获得全文 求助全文
来源期刊
Quality Management in Health Care
Quality Management in Health Care HEALTH CARE SCIENCES & SERVICES-
CiteScore
1.90
自引率
8.30%
发文量
108
期刊介绍: Quality Management in Health Care (QMHC) is a peer-reviewed journal that provides a forum for our readers to explore the theoretical, technical, and strategic elements of health care quality management. The journal''s primary focus is on organizational structure and processes as these affect the quality of care and patient outcomes. In particular, it: -Builds knowledge about the application of statistical tools, control charts, benchmarking, and other devices used in the ongoing monitoring and evaluation of care and of patient outcomes; -Encourages research in and evaluation of the results of various organizational strategies designed to bring about quantifiable improvements in patient outcomes; -Fosters the application of quality management science to patient care processes and clinical decision-making; -Fosters cooperation and communication among health care providers, payers and regulators in their efforts to improve the quality of patient outcomes; -Explores links among the various clinical, technical, administrative, and managerial disciplines involved in patient care, as well as the role and responsibilities of organizational governance in ongoing quality management.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信