Construction of an Assistant Forecast System for Breast Cancer Oncotype Dx Recurrence Risk by Machine Learning

X. Kong, Lin Zhang, Quanda Zhang, Jiun Choong, Sicong Ma, X. Qin, Z. Qi, Ran Cheng, Yi Fang, Z. Ge, Yu Jiang, Jing Wang
{"title":"Construction of an Assistant Forecast System for Breast Cancer Oncotype Dx Recurrence Risk by Machine Learning","authors":"X. Kong, Lin Zhang, Quanda Zhang, Jiun Choong, Sicong Ma, X. Qin, Z. Qi, Ran Cheng, Yi Fang, Z. Ge, Yu Jiang, Jing Wang","doi":"10.2139/ssrn.3642585","DOIUrl":null,"url":null,"abstract":"Background: \nTAILORx data confirm that using a 21-gene expression assay known as Oncotype DX (ODX; Genomic Health, Redwood City, CA) to assess the risk of early-stage breast cancer recurrence can spare women unnecessary chemotherapy. However, high up-front costs (list price, $4175) could dissuade usage. Also, from a technical perspective, this test cannot be widely used in developing countries, especially in relatively poor areas. \n \nMethods: \nBy analyzing the Surveillance, Epidemiology, and End-Results (SEER) database, Logistic Regression models were firstly used to identified significant variables that might be associated with breast cancer patients’ ODX recurrence scores (RS) and risk levels. Secondly, by adopting a series of machine leaning (ML) technologies, including random forest (RF), gradient boosting decision tree (GBDT), and XGBoost, we developed an assistant forecast system for the ODX recurrence risks [low-to-intermediate-risk (RS=2~25) and high-risk (RS=26~100)] based on individual’s sociodemographic information and clinicopathological information. This developed system was then validated in an independent validation data set via a training-test split method on the original data set. \n \nFindings: \nWe identified 111,635 patients with breast cancer, among which, 86617 patients (77.59%) were not beyond 50 years old. There were 23,514 patients (21.1%) whose ODX RSs were within the low risk of recurrence group, 71,439 patients (64.0%) were at intermediate-risk level, and 16,682 patients (14.9%) were at high-risk level. Via the multinomial ordinal logit regression, the variables closely associated with the ODX recurrence scores included age, sex, race, tumor primary site, histopathological grade, tumor size, pathology, PR status, HER2 status, (all P<0.05). Through our developed assistant forecast system, as long as a breast cancer patient’s precise sociodemographic and clinicopathological information was input, the computer would be able to automatically forecast the patient’s ODX recurrence risk level with an accuracy probability. According to the validation results, the best overall accuracy of this forecast system was 87.02% (Ordered Logistic Regression), with 99.06% specificity (Ordered Logistic Regression), and 86.0% sensitivity (RF). \n \nInterpretation: \nOur developed assistant forecast system based on sociodemographic and clinicopathological data provided clinicians an alternative tool to estimate breast cancer patients’ ODX recurrence risk level, which could be used to help assist in making an adjuvant treatment decision. In the future, this tool is widely worthwhile to be retrospectively validated in clinical practice and applied in actual clinical scenarios.","PeriodicalId":8928,"journal":{"name":"Biomaterials eJournal","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomaterials eJournal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3642585","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Background: TAILORx data confirm that using a 21-gene expression assay known as Oncotype DX (ODX; Genomic Health, Redwood City, CA) to assess the risk of early-stage breast cancer recurrence can spare women unnecessary chemotherapy. However, high up-front costs (list price, $4175) could dissuade usage. Also, from a technical perspective, this test cannot be widely used in developing countries, especially in relatively poor areas. Methods: By analyzing the Surveillance, Epidemiology, and End-Results (SEER) database, Logistic Regression models were firstly used to identified significant variables that might be associated with breast cancer patients’ ODX recurrence scores (RS) and risk levels. Secondly, by adopting a series of machine leaning (ML) technologies, including random forest (RF), gradient boosting decision tree (GBDT), and XGBoost, we developed an assistant forecast system for the ODX recurrence risks [low-to-intermediate-risk (RS=2~25) and high-risk (RS=26~100)] based on individual’s sociodemographic information and clinicopathological information. This developed system was then validated in an independent validation data set via a training-test split method on the original data set. Findings: We identified 111,635 patients with breast cancer, among which, 86617 patients (77.59%) were not beyond 50 years old. There were 23,514 patients (21.1%) whose ODX RSs were within the low risk of recurrence group, 71,439 patients (64.0%) were at intermediate-risk level, and 16,682 patients (14.9%) were at high-risk level. Via the multinomial ordinal logit regression, the variables closely associated with the ODX recurrence scores included age, sex, race, tumor primary site, histopathological grade, tumor size, pathology, PR status, HER2 status, (all P<0.05). Through our developed assistant forecast system, as long as a breast cancer patient’s precise sociodemographic and clinicopathological information was input, the computer would be able to automatically forecast the patient’s ODX recurrence risk level with an accuracy probability. According to the validation results, the best overall accuracy of this forecast system was 87.02% (Ordered Logistic Regression), with 99.06% specificity (Ordered Logistic Regression), and 86.0% sensitivity (RF). Interpretation: Our developed assistant forecast system based on sociodemographic and clinicopathological data provided clinicians an alternative tool to estimate breast cancer patients’ ODX recurrence risk level, which could be used to help assist in making an adjuvant treatment decision. In the future, this tool is widely worthwhile to be retrospectively validated in clinical practice and applied in actual clinical scenarios.
基于机器学习的乳腺癌Oncotype Dx复发风险辅助预测系统构建
背景:TAILORx数据证实,使用21个基因表达试验Oncotype DX (ODX;基因组健康,红木城,CA)评估早期乳腺癌复发的风险可以避免妇女不必要的化疗。然而,高昂的前期成本(标价4175美元)可能会阻碍用户使用。此外,从技术角度来看,这种测试不能在发展中国家广泛使用,特别是在相对贫穷的地区。方法:通过对SEER (Surveillance, Epidemiology, and End-Results)数据库的分析,首先采用Logistic回归模型识别可能与乳腺癌患者ODX复发评分(RS)和风险水平相关的显著变量。其次,采用随机森林(random forest, RF)、梯度增强决策树(gradient boosting decision tree, GBDT)、XGBoost等一系列机器学习(ML)技术,基于个体的社会人口统计学信息和临床病理信息,开发了ODX复发风险[低至中危(RS=2~25)和高危(RS=26~100)]的辅助预测系统。然后,通过原始数据集上的训练-测试分割方法,在独立的验证数据集中对该开发系统进行了验证。结果:共发现111635例乳腺癌患者,其中年龄不超过50岁的患者86617例(77.59%)。ODX RSs低危复发组23514例(21.1%),中危复发组71439例(64.0%),高危复发组16682例(14.9%)。经多项有序logit回归分析,与ODX复发评分密切相关的变量包括年龄、性别、种族、肿瘤原发部位、组织病理分级、肿瘤大小、病理、PR状态、HER2状态等(均P<0.05)。通过我们开发的辅助预测系统,只要输入乳腺癌患者精确的社会人口学和临床病理信息,计算机就能自动预测患者的ODX复发风险水平,并具有准确的概率。验证结果表明,该预测系统的最佳总体准确率为87.02%(有序Logistic回归),特异度为99.06%(有序Logistic回归),敏感性为86.0% (RF)。解释:我们开发的基于社会人口学和临床病理数据的辅助预测系统为临床医生提供了一种评估乳腺癌患者ODX复发风险水平的替代工具,可用于帮助制定辅助治疗决策。在未来,该工具值得在临床实践中进行回顾性验证,并在实际临床场景中应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信