基于百万病历的乳腺癌风险短期预测模型

IF 2.5 3区 医学 Q2 ONCOLOGY
Ofer Feinstein, Dan Ofer, Eitan Bachmat, Sivan Gazit, Michal Linial, Tehillah S Menes
{"title":"基于百万病历的乳腺癌风险短期预测模型","authors":"Ofer Feinstein, Dan Ofer, Eitan Bachmat, Sivan Gazit, Michal Linial, Tehillah S Menes","doi":"10.1016/j.clbc.2025.07.025","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Despite progress in breast cancer screening many women are diagnosed with advanced stage. We sought to develop a short-term (one year) prediction model for breast cancer risk, based on readily available data from electronic medical records (EMRs), to support decision-making.</p><p><strong>Methods: </strong>A retrospective cohort study using data of 1,039,212 members of a large healthcare organization between the years 1985 and 2021. During the study years, 18,959 people were diagnosed with breast cancer. Longitudinal personal medical information such as demographics, cancer-related family history, smoking habits, medical history, fertility treatments, surgeries, biopsies, medications, BMI, blood pressure and lab tests was used to predict the outcome: breast cancer diagnosis one year from the recorded data. Prediction models were trained using the CatBoost decision tree methodology. SHapley Additive exPlanations (SHAP) values were used to estimate the marginal impact of a feature on the model performance, considering the other features.</p><p><strong>Results: </strong>The model includes numerous features not utilized in existing breast cancer risk models (e.g., medications, systolic blood pressure, TSH levels and more), available from the EMR. The informative features, ranked by SHAP values, include age, the number of surgical consultations and the number of breast biopsies. The model achieved high performance with an area under the ROC curve (AUC-ROC) of 0.85.</p><p><strong>Conclusions: </strong>Use of data readily available from the EMR, can assist clinicians when assessing the short-term breast cancer risk.</p>","PeriodicalId":10197,"journal":{"name":"Clinical breast cancer","volume":" ","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Short-Term Prediction Model for Breast Cancer Risk Based on One Million Medical Records.\",\"authors\":\"Ofer Feinstein, Dan Ofer, Eitan Bachmat, Sivan Gazit, Michal Linial, Tehillah S Menes\",\"doi\":\"10.1016/j.clbc.2025.07.025\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Despite progress in breast cancer screening many women are diagnosed with advanced stage. We sought to develop a short-term (one year) prediction model for breast cancer risk, based on readily available data from electronic medical records (EMRs), to support decision-making.</p><p><strong>Methods: </strong>A retrospective cohort study using data of 1,039,212 members of a large healthcare organization between the years 1985 and 2021. During the study years, 18,959 people were diagnosed with breast cancer. Longitudinal personal medical information such as demographics, cancer-related family history, smoking habits, medical history, fertility treatments, surgeries, biopsies, medications, BMI, blood pressure and lab tests was used to predict the outcome: breast cancer diagnosis one year from the recorded data. Prediction models were trained using the CatBoost decision tree methodology. SHapley Additive exPlanations (SHAP) values were used to estimate the marginal impact of a feature on the model performance, considering the other features.</p><p><strong>Results: </strong>The model includes numerous features not utilized in existing breast cancer risk models (e.g., medications, systolic blood pressure, TSH levels and more), available from the EMR. The informative features, ranked by SHAP values, include age, the number of surgical consultations and the number of breast biopsies. The model achieved high performance with an area under the ROC curve (AUC-ROC) of 0.85.</p><p><strong>Conclusions: </strong>Use of data readily available from the EMR, can assist clinicians when assessing the short-term breast cancer risk.</p>\",\"PeriodicalId\":10197,\"journal\":{\"name\":\"Clinical breast cancer\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-07-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical breast cancer\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.clbc.2025.07.025\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical breast cancer","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.clbc.2025.07.025","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

背景:尽管乳腺癌筛查取得了进展,但许多妇女被诊断为晚期。我们试图建立一个短期(一年)的乳腺癌风险预测模型,基于电子医疗记录(EMRs)中现成的数据,以支持决策。方法:一项回顾性队列研究,使用1985年至2021年间一家大型医疗机构1,039,212名成员的数据。在研究期间,18,959人被诊断患有乳腺癌。纵向个人医疗信息,如人口统计、癌症相关家族史、吸烟习惯、病史、生育治疗、手术、活组织检查、药物、BMI、血压和实验室测试,被用来预测结果:根据记录的数据,乳腺癌诊断一年。使用CatBoost决策树方法训练预测模型。SHapley加性解释(SHAP)值用于估计一个特征对模型性能的边际影响,考虑到其他特征。结果:该模型包含了现有乳腺癌风险模型中未使用的许多特征(例如,药物、收缩压、TSH水平等),可从EMR中获得。信息特征,按SHAP值排序,包括年龄,手术咨询次数和乳腺活检次数。该模型的ROC曲线下面积(AUC-ROC)为0.85。结论:利用EMR中现成的数据,可以帮助临床医生评估短期乳腺癌风险。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Short-Term Prediction Model for Breast Cancer Risk Based on One Million Medical Records.

Background: Despite progress in breast cancer screening many women are diagnosed with advanced stage. We sought to develop a short-term (one year) prediction model for breast cancer risk, based on readily available data from electronic medical records (EMRs), to support decision-making.

Methods: A retrospective cohort study using data of 1,039,212 members of a large healthcare organization between the years 1985 and 2021. During the study years, 18,959 people were diagnosed with breast cancer. Longitudinal personal medical information such as demographics, cancer-related family history, smoking habits, medical history, fertility treatments, surgeries, biopsies, medications, BMI, blood pressure and lab tests was used to predict the outcome: breast cancer diagnosis one year from the recorded data. Prediction models were trained using the CatBoost decision tree methodology. SHapley Additive exPlanations (SHAP) values were used to estimate the marginal impact of a feature on the model performance, considering the other features.

Results: The model includes numerous features not utilized in existing breast cancer risk models (e.g., medications, systolic blood pressure, TSH levels and more), available from the EMR. The informative features, ranked by SHAP values, include age, the number of surgical consultations and the number of breast biopsies. The model achieved high performance with an area under the ROC curve (AUC-ROC) of 0.85.

Conclusions: Use of data readily available from the EMR, can assist clinicians when assessing the short-term breast cancer risk.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Clinical breast cancer
Clinical breast cancer 医学-肿瘤学
CiteScore
5.40
自引率
3.20%
发文量
174
审稿时长
48 days
期刊介绍: Clinical Breast Cancer is a peer-reviewed bimonthly journal that publishes original articles describing various aspects of clinical and translational research of breast cancer. Clinical Breast Cancer is devoted to articles on detection, diagnosis, prevention, and treatment of breast cancer. The main emphasis is on recent scientific developments in all areas related to breast cancer. Specific areas of interest include clinical research reports from various therapeutic modalities, cancer genetics, drug sensitivity and resistance, novel imaging, tumor genomics, biomarkers, and chemoprevention strategies.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信