A machine learning-based analysis of nationwide cancer comprehensive genomic profiling data across cancer types to identify features associated with recommendation of genome-matched therapy

IF 7.1 2区 医学 Q1 ONCOLOGY
H. Ikushima , K. Watanabe , A. Shinozaki-Ushiku , K. Oda , H. Kage
{"title":"A machine learning-based analysis of nationwide cancer comprehensive genomic profiling data across cancer types to identify features associated with recommendation of genome-matched therapy","authors":"H. Ikushima ,&nbsp;K. Watanabe ,&nbsp;A. Shinozaki-Ushiku ,&nbsp;K. Oda ,&nbsp;H. Kage","doi":"10.1016/j.esmoop.2024.103998","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>The low probability of identifying druggable mutations through comprehensive genomic profiling (CGP) and its financial and time costs hinder its widespread adoption. To enhance the effectiveness and efficiency of cancer precision medicine, it is critical to identify patient characteristics that are most likely to benefit from CGP.</div></div><div><h3>Patients and methods</h3><div>This nationwide retrospective study employed machine learning models to predict the identification of genome-matched therapies by CGP, utilizing a national database covering 99.7% of the patients who underwent CGP in Japan from June 2019 to November 2023. Prediction models were constructed for the overall cancer population, specific cancer types, and adolescent and young adult (AYA) group. The SHapley Additive exPlanations (SHAP) algorithm was applied to elucidate clinical features contributing to model predictions.</div></div><div><h3>Results</h3><div>This study included 60 655 patients [mean age (standard deviation), 60.8 years (14.5 years); 50.1% males]. CGP identified at least one genome-matched therapy in 11 227 cases (18.5%). The best prediction model was eXtreme Gradient Boosting (XGBoost) with an area under the receiver operating characteristic curve of 0.819. Cancer type was the most important predictor (negative for pancreas and positive for breast and lung), followed by the age, presence of liver metastasis, and number of metastatic sites. Analysis of cancer type-specific models identified several organ-specific features, including the sex, interval between the cancer diagnosis and CGP, sampling site, and CGP panel. Among 3455 AYA patients, genome-matched therapies were identified in 459 patients (13.3%). The AYA-specific model achieved an area under the receiver operating characteristic curve of 0.768, with bone tumor identified as a negative predictor in addition to those identified in the overall cancer population model.</div></div><div><h3>Conclusion</h3><div>Several factors predicting the identification of genome-matched therapies through CGP were identified for the overall cancer population and cancer type-specific subpopulations. Expedited CGP is recommended for patients who match the identified profile to facilitate early targeted therapy.</div></div>","PeriodicalId":11877,"journal":{"name":"ESMO Open","volume":"9 12","pages":"Article 103998"},"PeriodicalIF":7.1000,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ESMO Open","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S205970292401768X","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background

The low probability of identifying druggable mutations through comprehensive genomic profiling (CGP) and its financial and time costs hinder its widespread adoption. To enhance the effectiveness and efficiency of cancer precision medicine, it is critical to identify patient characteristics that are most likely to benefit from CGP.

Patients and methods

This nationwide retrospective study employed machine learning models to predict the identification of genome-matched therapies by CGP, utilizing a national database covering 99.7% of the patients who underwent CGP in Japan from June 2019 to November 2023. Prediction models were constructed for the overall cancer population, specific cancer types, and adolescent and young adult (AYA) group. The SHapley Additive exPlanations (SHAP) algorithm was applied to elucidate clinical features contributing to model predictions.

Results

This study included 60 655 patients [mean age (standard deviation), 60.8 years (14.5 years); 50.1% males]. CGP identified at least one genome-matched therapy in 11 227 cases (18.5%). The best prediction model was eXtreme Gradient Boosting (XGBoost) with an area under the receiver operating characteristic curve of 0.819. Cancer type was the most important predictor (negative for pancreas and positive for breast and lung), followed by the age, presence of liver metastasis, and number of metastatic sites. Analysis of cancer type-specific models identified several organ-specific features, including the sex, interval between the cancer diagnosis and CGP, sampling site, and CGP panel. Among 3455 AYA patients, genome-matched therapies were identified in 459 patients (13.3%). The AYA-specific model achieved an area under the receiver operating characteristic curve of 0.768, with bone tumor identified as a negative predictor in addition to those identified in the overall cancer population model.

Conclusion

Several factors predicting the identification of genome-matched therapies through CGP were identified for the overall cancer population and cancer type-specific subpopulations. Expedited CGP is recommended for patients who match the identified profile to facilitate early targeted therapy.
基于机器学习的全国癌症综合基因组图谱数据跨癌症类型分析,以确定与推荐基因组匹配治疗相关的特征
背景通过综合基因组图谱(CGP)鉴定出可治疗突变的概率很低,而且其经济和时间成本也很高,这阻碍了CGP的广泛应用。为了提高癌症精准医疗的效果和效率,识别最有可能从CGP中获益的患者特征至关重要。这项全国性的回顾性研究利用机器学习模型预测CGP识别基因组匹配疗法的情况,研究利用的国家数据库涵盖了2019年6月至2023年11月期间日本接受CGP的99.7%的患者。针对癌症总体人群、特定癌症类型以及青少年和年轻成人(AYA)群体构建了预测模型。结果这项研究纳入了 60 655 名患者[平均年龄(标准差)60.8 岁(14.5 岁);50.1% 为男性]。CGP 至少为 11 227 例患者(18.5%)确定了一种基因组匹配疗法。最佳预测模型是梯度提升模型(XGBoost),接收者操作特征曲线下面积为 0.819。癌症类型是最重要的预测因素(胰腺癌为阴性,乳腺癌和肺癌为阳性),其次是年龄、肝转移的存在和转移部位的数量。癌症类型特异性模型分析确定了几个器官特异性特征,包括性别、癌症诊断与 CGP 之间的间隔时间、取样部位和 CGP 面板。在 3455 名青壮年患者中,有 459 名患者(13.3%)确定了基因组匹配疗法。除了总体癌症人群模型中确定的预测因素外,骨肿瘤也被确定为一个负预测因素。建议对符合已确定特征的患者加快 CGP,以促进早期靶向治疗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ESMO Open
ESMO Open Medicine-Oncology
CiteScore
11.70
自引率
2.70%
发文量
255
审稿时长
10 weeks
期刊介绍: ESMO Open is the online-only, open access journal of the European Society for Medical Oncology (ESMO). It is a peer-reviewed publication dedicated to sharing high-quality medical research and educational materials from various fields of oncology. The journal specifically focuses on showcasing innovative clinical and translational cancer research. ESMO Open aims to publish a wide range of research articles covering all aspects of oncology, including experimental studies, translational research, diagnostic advancements, and therapeutic approaches. The content of the journal includes original research articles, insightful reviews, thought-provoking editorials, and correspondence. Moreover, the journal warmly welcomes the submission of phase I trials and meta-analyses. It also showcases reviews from significant ESMO conferences and meetings, as well as publishes important position statements on behalf of ESMO. Overall, ESMO Open offers a platform for scientists, clinicians, and researchers in the field of oncology to share their valuable insights and contribute to advancing the understanding and treatment of cancer. The journal serves as a source of up-to-date information and fosters collaboration within the oncology community.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信