Xingjian Xiao , Shiyou Liu , Kubra Maqsood , Xiaohan Yi , Guoqun Xie , Hailei Zhao , Bo Sun , Jianying Mao , Xianglong Xu
{"title":"Using machine learning algorithms to predict colorectal polyps","authors":"Xingjian Xiao , Shiyou Liu , Kubra Maqsood , Xiaohan Yi , Guoqun Xie , Hailei Zhao , Bo Sun , Jianying Mao , Xianglong Xu","doi":"10.1016/j.lanwpc.2024.101356","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Colorectal cancer (CRC) is the third most common cancer worldwide, and colorectal polyps (CRP) represent a necessary pathway to the development of CRC. Surveys indicate that the prevalence of colorectal polyps is 20% at age 45, increasing to over 50% to 60% by age 85 globally. In China, the prevalence of colorectal polyps among residents is approximately 18.1%, and there is a certain correlation with age: the older the age, the higher the prevalence. Until now, no studies have been conducted on utilizing non-invasive factors to predict colorectal polyps.</div></div><div><h3>Methods</h3><div>Our study was based on a population-based cross-sectional survey. We included data from 5,461 cases with colonoscopy results among 49,701 initial positive consultations in the colorectal cancer screening project conducted in Baoshan District, Shanghai, from 2013 to 2021. Multiple machine learning models including adaptive boosting classifier and gradient boosting machine were established to predict colorectal polyps. In the setting of outcome indicators, patients diagnosed with colorectal polyps through clinical colonoscopy results, pathological findings, and imaging techniques are considered to have colorectal polyps. An area under the curve (AUC) of each established model exceeding 0.7 was considered acceptable for predicting colorectal polyps. The optimal model was used to identify predictors of colorectal polyps.</div></div><div><h3>Findings</h3><div>Non-invasive predictors such as sociodemographic information, behavioural history, and medical history were used to predict the current occurrence of colorectal. In our study, the AUC of Random Forest and eXtreme Gradient Boosting reached 0.71, Adaptive Boosting Machine, Gradient Boosting Machine and Light Gradient Boosting Machine reached 0.7 in predicting the occurrence of colorectal cancer. Among the various variables predicting colorectal polyps, age, smoking, gender, cancer history, FOBT (Fecal Occult Blood Test), occupation, and education level are important predictors of colorectal polyps.</div></div><div><h3>Interpretation</h3><div>Using non-invasive factors and machine learning algorithms can accurately predict the occurrence of colorectal polyps in individuals with positive initial screening results. In the context of low colonoscopy examination rates, our machine learning predictive models may help prompt patients to undergo further examinations and interventions, thereby improve the earlier diagnosis and treatment. The rate of colonoscopy examinations is very low, even among individuals with positive initial screening results. We propose a machine learning approach that can identify individuals with colorectal polyps in this group, thereby increasing the screening rate for colorectal cancer and helping to prevent the disease.</div></div><div><h3>Funding</h3><div>This study was supported by Health Promotion and Education of the Key medical Specialty of Baoshan District, shanghai (BSZK-2023-BZ14), Traditional Chinese medicine research project of Shanghai Municipal Health Commission (20240N108), and Construction of Traditional Chinese Medicine Inheritance and innovation Development Demonstration Pilot Projects in Pudong New Area - High-Level Research-Oriented Traditional Chinese Medicine Hospital Construction (C-2023-0901).</div></div>","PeriodicalId":22792,"journal":{"name":"The Lancet Regional Health: Western Pacific","volume":"55 ","pages":"Article 101356"},"PeriodicalIF":7.6000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Lancet Regional Health: Western Pacific","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S266660652400350X","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Colorectal cancer (CRC) is the third most common cancer worldwide, and colorectal polyps (CRP) represent a necessary pathway to the development of CRC. Surveys indicate that the prevalence of colorectal polyps is 20% at age 45, increasing to over 50% to 60% by age 85 globally. In China, the prevalence of colorectal polyps among residents is approximately 18.1%, and there is a certain correlation with age: the older the age, the higher the prevalence. Until now, no studies have been conducted on utilizing non-invasive factors to predict colorectal polyps.
Methods
Our study was based on a population-based cross-sectional survey. We included data from 5,461 cases with colonoscopy results among 49,701 initial positive consultations in the colorectal cancer screening project conducted in Baoshan District, Shanghai, from 2013 to 2021. Multiple machine learning models including adaptive boosting classifier and gradient boosting machine were established to predict colorectal polyps. In the setting of outcome indicators, patients diagnosed with colorectal polyps through clinical colonoscopy results, pathological findings, and imaging techniques are considered to have colorectal polyps. An area under the curve (AUC) of each established model exceeding 0.7 was considered acceptable for predicting colorectal polyps. The optimal model was used to identify predictors of colorectal polyps.
Findings
Non-invasive predictors such as sociodemographic information, behavioural history, and medical history were used to predict the current occurrence of colorectal. In our study, the AUC of Random Forest and eXtreme Gradient Boosting reached 0.71, Adaptive Boosting Machine, Gradient Boosting Machine and Light Gradient Boosting Machine reached 0.7 in predicting the occurrence of colorectal cancer. Among the various variables predicting colorectal polyps, age, smoking, gender, cancer history, FOBT (Fecal Occult Blood Test), occupation, and education level are important predictors of colorectal polyps.
Interpretation
Using non-invasive factors and machine learning algorithms can accurately predict the occurrence of colorectal polyps in individuals with positive initial screening results. In the context of low colonoscopy examination rates, our machine learning predictive models may help prompt patients to undergo further examinations and interventions, thereby improve the earlier diagnosis and treatment. The rate of colonoscopy examinations is very low, even among individuals with positive initial screening results. We propose a machine learning approach that can identify individuals with colorectal polyps in this group, thereby increasing the screening rate for colorectal cancer and helping to prevent the disease.
Funding
This study was supported by Health Promotion and Education of the Key medical Specialty of Baoshan District, shanghai (BSZK-2023-BZ14), Traditional Chinese medicine research project of Shanghai Municipal Health Commission (20240N108), and Construction of Traditional Chinese Medicine Inheritance and innovation Development Demonstration Pilot Projects in Pudong New Area - High-Level Research-Oriented Traditional Chinese Medicine Hospital Construction (C-2023-0901).
期刊介绍:
The Lancet Regional Health – Western Pacific, a gold open access journal, is an integral part of The Lancet's global initiative advocating for healthcare quality and access worldwide. It aims to advance clinical practice and health policy in the Western Pacific region, contributing to enhanced health outcomes. The journal publishes high-quality original research shedding light on clinical practice and health policy in the region. It also includes reviews, commentaries, and opinion pieces covering diverse regional health topics, such as infectious diseases, non-communicable diseases, child and adolescent health, maternal and reproductive health, aging health, mental health, the health workforce and systems, and health policy.