Ofer Feinstein, Dan Ofer, Eitan Bachmat, Sivan Gazit, Michal Linial, Tehillah S Menes
{"title":"Short-Term Prediction Model for Breast Cancer Risk Based on One Million Medical Records.","authors":"Ofer Feinstein, Dan Ofer, Eitan Bachmat, Sivan Gazit, Michal Linial, Tehillah S Menes","doi":"10.1016/j.clbc.2025.07.025","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Despite progress in breast cancer screening many women are diagnosed with advanced stage. We sought to develop a short-term (one year) prediction model for breast cancer risk, based on readily available data from electronic medical records (EMRs), to support decision-making.</p><p><strong>Methods: </strong>A retrospective cohort study using data of 1,039,212 members of a large healthcare organization between the years 1985 and 2021. During the study years, 18,959 people were diagnosed with breast cancer. Longitudinal personal medical information such as demographics, cancer-related family history, smoking habits, medical history, fertility treatments, surgeries, biopsies, medications, BMI, blood pressure and lab tests was used to predict the outcome: breast cancer diagnosis one year from the recorded data. Prediction models were trained using the CatBoost decision tree methodology. SHapley Additive exPlanations (SHAP) values were used to estimate the marginal impact of a feature on the model performance, considering the other features.</p><p><strong>Results: </strong>The model includes numerous features not utilized in existing breast cancer risk models (e.g., medications, systolic blood pressure, TSH levels and more), available from the EMR. The informative features, ranked by SHAP values, include age, the number of surgical consultations and the number of breast biopsies. The model achieved high performance with an area under the ROC curve (AUC-ROC) of 0.85.</p><p><strong>Conclusions: </strong>Use of data readily available from the EMR, can assist clinicians when assessing the short-term breast cancer risk.</p>","PeriodicalId":10197,"journal":{"name":"Clinical breast cancer","volume":" ","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical breast cancer","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.clbc.2025.07.025","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Despite progress in breast cancer screening many women are diagnosed with advanced stage. We sought to develop a short-term (one year) prediction model for breast cancer risk, based on readily available data from electronic medical records (EMRs), to support decision-making.
Methods: A retrospective cohort study using data of 1,039,212 members of a large healthcare organization between the years 1985 and 2021. During the study years, 18,959 people were diagnosed with breast cancer. Longitudinal personal medical information such as demographics, cancer-related family history, smoking habits, medical history, fertility treatments, surgeries, biopsies, medications, BMI, blood pressure and lab tests was used to predict the outcome: breast cancer diagnosis one year from the recorded data. Prediction models were trained using the CatBoost decision tree methodology. SHapley Additive exPlanations (SHAP) values were used to estimate the marginal impact of a feature on the model performance, considering the other features.
Results: The model includes numerous features not utilized in existing breast cancer risk models (e.g., medications, systolic blood pressure, TSH levels and more), available from the EMR. The informative features, ranked by SHAP values, include age, the number of surgical consultations and the number of breast biopsies. The model achieved high performance with an area under the ROC curve (AUC-ROC) of 0.85.
Conclusions: Use of data readily available from the EMR, can assist clinicians when assessing the short-term breast cancer risk.
期刊介绍:
Clinical Breast Cancer is a peer-reviewed bimonthly journal that publishes original articles describing various aspects of clinical and translational research of breast cancer. Clinical Breast Cancer is devoted to articles on detection, diagnosis, prevention, and treatment of breast cancer. The main emphasis is on recent scientific developments in all areas related to breast cancer. Specific areas of interest include clinical research reports from various therapeutic modalities, cancer genetics, drug sensitivity and resistance, novel imaging, tumor genomics, biomarkers, and chemoprevention strategies.