Using machine learning algorithms to predict colorectal cancer

IF 7.6 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES
Xingjian Xiao , Bo Hong , Kubra Maqsood , Xiaohan Yi , Guoqun Xie , Hailei Zhao , Bo Sun , Jianying Mao , Shiyou Liu , Xianglong Xu
{"title":"Using machine learning algorithms to predict colorectal cancer","authors":"Xingjian Xiao ,&nbsp;Bo Hong ,&nbsp;Kubra Maqsood ,&nbsp;Xiaohan Yi ,&nbsp;Guoqun Xie ,&nbsp;Hailei Zhao ,&nbsp;Bo Sun ,&nbsp;Jianying Mao ,&nbsp;Shiyou Liu ,&nbsp;Xianglong Xu","doi":"10.1016/j.lanwpc.2024.101355","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Colorectal cancer (CRC) is the second most common type of cancer in China, with middle-aged and elderly adults being at high risk. However, the colonoscopy examination rate among middle-aged and elderly adults is very low. As of 2020, the colonoscopy examination rate in China was 914.8 per 100,000 people, and the distribution across regions was extremely uneven. Given the high incidence and mortality rates of colorectal cancer and the low screening rate of colonoscopies in the initial screening positive population for colorectal cancer, further interventions will be needed. The objective of this study was to use machine learning and 0.2 million consultation data to predict colorectal cancer and identify important predictors.</div></div><div><h3>Methods</h3><div>Our study was based on a population-based cross-sectional survey. We used data from 5,664 cases with colonoscopy results out of 49,701 initial positive consultations in the colorectal cancer screening project in Baoshan District, Shanghai, from 2013 to 2021. Multiple machine learning models including adaptive boosting classifier and gradient boosting machine were established to predict colorectal cancer. In the setting of outcome indicators, patients diagnosed with colorectal cancer through clinical colonoscopy results are considered to have colorectal cancer. An area under the curve (AUC) of each established model exceeding 0.7 was considered acceptable for predicting colorectal cancer. The optimal model was used to identify predictors of colorectal cancer.</div></div><div><h3>Findings</h3><div>The incidence of colorectal cancer and the colonoscopy rate is 3.58% (203/5664) and 11.4% (5664/49,701). Non-invasive predictors such as sociodemographic information, behavioural history, and medical history were used to predict the current occurrence of colorectal cancer. In our study, the accuracy of Gradient Boosting Machine, Support Vector Machine, and Light Gradient Boosting Machine reached 0.86, while the accuracy of eXtreme Gradient Boosting reached 0.84 in predicting the occurrence of colorectal cancer. Among the variables predicting colorectal cancer, age, occupation, education, history of bowel cancer in first-degree relatives, history of cholecystitis are important predictors.</div></div><div><h3>Interpretation</h3><div>Using machine learning methods and non-invasive predictors can accurately predict colorectal cancer in individuals with positive initial screening results for colorectal cancer. Our machine learning predictive models can provide further risk for colorectal cancer, which may help increase the colonoscopy examination rate among individuals with positive initial screening results. In individuals with positive colorectal cancer screenings, colonoscopy rates are low. Our machine learning models can enhance screening rates, aiding in disease prevention.</div></div><div><h3>Funding</h3><div>This study was supported by Health Promotion and Education of the Key medical Specialty of Baoshan District, shanghai (BSZK-2023-BZ14), Traditional Chinese medicine research project of Shanghai Municipal Health Commission (20240N108), and Construction of Traditional Chinese Medicine Inheritance and innovation Development Demonstration Pilot Projects in Pudong New Area - High-Level Research-Oriented Traditional Chinese Medicine Hospital Construction (C-2023-0901).</div></div>","PeriodicalId":22792,"journal":{"name":"The Lancet Regional Health: Western Pacific","volume":"55 ","pages":"Article 101355"},"PeriodicalIF":7.6000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Lancet Regional Health: Western Pacific","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666606524003493","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Background

Colorectal cancer (CRC) is the second most common type of cancer in China, with middle-aged and elderly adults being at high risk. However, the colonoscopy examination rate among middle-aged and elderly adults is very low. As of 2020, the colonoscopy examination rate in China was 914.8 per 100,000 people, and the distribution across regions was extremely uneven. Given the high incidence and mortality rates of colorectal cancer and the low screening rate of colonoscopies in the initial screening positive population for colorectal cancer, further interventions will be needed. The objective of this study was to use machine learning and 0.2 million consultation data to predict colorectal cancer and identify important predictors.

Methods

Our study was based on a population-based cross-sectional survey. We used data from 5,664 cases with colonoscopy results out of 49,701 initial positive consultations in the colorectal cancer screening project in Baoshan District, Shanghai, from 2013 to 2021. Multiple machine learning models including adaptive boosting classifier and gradient boosting machine were established to predict colorectal cancer. In the setting of outcome indicators, patients diagnosed with colorectal cancer through clinical colonoscopy results are considered to have colorectal cancer. An area under the curve (AUC) of each established model exceeding 0.7 was considered acceptable for predicting colorectal cancer. The optimal model was used to identify predictors of colorectal cancer.

Findings

The incidence of colorectal cancer and the colonoscopy rate is 3.58% (203/5664) and 11.4% (5664/49,701). Non-invasive predictors such as sociodemographic information, behavioural history, and medical history were used to predict the current occurrence of colorectal cancer. In our study, the accuracy of Gradient Boosting Machine, Support Vector Machine, and Light Gradient Boosting Machine reached 0.86, while the accuracy of eXtreme Gradient Boosting reached 0.84 in predicting the occurrence of colorectal cancer. Among the variables predicting colorectal cancer, age, occupation, education, history of bowel cancer in first-degree relatives, history of cholecystitis are important predictors.

Interpretation

Using machine learning methods and non-invasive predictors can accurately predict colorectal cancer in individuals with positive initial screening results for colorectal cancer. Our machine learning predictive models can provide further risk for colorectal cancer, which may help increase the colonoscopy examination rate among individuals with positive initial screening results. In individuals with positive colorectal cancer screenings, colonoscopy rates are low. Our machine learning models can enhance screening rates, aiding in disease prevention.

Funding

This study was supported by Health Promotion and Education of the Key medical Specialty of Baoshan District, shanghai (BSZK-2023-BZ14), Traditional Chinese medicine research project of Shanghai Municipal Health Commission (20240N108), and Construction of Traditional Chinese Medicine Inheritance and innovation Development Demonstration Pilot Projects in Pudong New Area - High-Level Research-Oriented Traditional Chinese Medicine Hospital Construction (C-2023-0901).
使用机器学习算法预测结直肠癌
结直肠癌(CRC)是中国第二常见的癌症类型,中老年人是高危人群。然而,中老年人的结肠镜检查率很低。截至2020年,中国结肠镜检查率为914.8 / 10万人,地区分布极不均衡。鉴于结直肠癌的高发病率和死亡率,以及结直肠癌初步筛查阳性人群结肠镜检查的低筛查率,需要进一步的干预措施。本研究的目的是利用机器学习和20万咨询数据来预测结直肠癌并确定重要的预测因子。方法本研究采用以人群为基础的横断面调查。我们使用了2013年至2021年上海宝山区结直肠癌筛查项目49701例初步阳性咨询中5664例结肠镜检查结果的数据。建立了包括自适应增强分类器和梯度增强机在内的多种机器学习模型来预测结直肠癌。在结局指标的设置上,通过临床结肠镜检查结果诊断为结直肠癌的患者被认为患有结直肠癌。每个已建立模型的曲线下面积(AUC)超过0.7被认为可用于预测结直肠癌。最优模型用于确定结直肠癌的预测因子。结果结直肠癌发病率和结肠镜检查率分别为3.58%(203/5664)和11.4%(5664/49,701)。非侵入性预测因素,如社会人口统计信息、行为史和病史被用来预测当前结直肠癌的发生。在我们的研究中,梯度增强机、支持向量机和光梯度增强机预测结直肠癌发生的准确率达到0.86,而极端梯度增强机预测结直肠癌发生的准确率达到0.84。在预测结直肠癌的变量中,年龄、职业、文化程度、一级亲属是否有肠癌史、胆囊炎史是重要的预测因素。使用机器学习方法和非侵入性预测因子可以准确预测结直肠癌初始筛查结果阳性的个体的结直肠癌。我们的机器学习预测模型可以提供结直肠癌的进一步风险,这可能有助于提高初始筛查结果为阳性的个体的结肠镜检查率。在结直肠癌筛查阳性的个体中,结肠镜检查率很低。我们的机器学习模型可以提高筛查率,帮助疾病预防。本研究得到上海市宝山区重点医学专科健康促进与教育项目(BSZK-2023-BZ14)、上海市卫生健康委员会中医药研究项目(20240N108)、浦东新区中医药传承创新发展示范试点项目建设-高水平研究型中医医院建设项目(C-2023-0901)的支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
The Lancet Regional Health: Western Pacific
The Lancet Regional Health: Western Pacific Medicine-Pediatrics, Perinatology and Child Health
CiteScore
8.80
自引率
2.80%
发文量
305
审稿时长
11 weeks
期刊介绍: The Lancet Regional Health – Western Pacific, a gold open access journal, is an integral part of The Lancet's global initiative advocating for healthcare quality and access worldwide. It aims to advance clinical practice and health policy in the Western Pacific region, contributing to enhanced health outcomes. The journal publishes high-quality original research shedding light on clinical practice and health policy in the region. It also includes reviews, commentaries, and opinion pieces covering diverse regional health topics, such as infectious diseases, non-communicable diseases, child and adolescent health, maternal and reproductive health, aging health, mental health, the health workforce and systems, and health policy.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信