{"title":"Construction and validation of a machine learning algorithm-based predictive model for difficult colonoscopy insertion.","authors":"Ren-Xuan Gao, Xin-Lei Wang, Ming-Jie Tian, Xiao-Ming Li, Jia-Jia Zhang, Jun-Jing Wang, Jing Gao, Chao Zhang, Zhi-Ting Li","doi":"10.4253/wjge.v17.i7.108307","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Difficulty of colonoscopy insertion (DCI) significantly affects colonoscopy effectiveness and serves as a key quality indicator. Predicting and evaluating DCI risk preoperatively is crucial for optimizing intraoperative strategies.</p><p><strong>Aim: </strong>To evaluate the predictive performance of machine learning (ML) algorithms for DCI by comparing three modeling approaches, identify factors influencing DCI, and develop a preoperative prediction model using ML algorithms to enhance colonoscopy quality and efficiency.</p><p><strong>Methods: </strong>This cross-sectional study enrolled 712 patients who underwent colonoscopy at a tertiary hospital between June 2020 and May 2021. Demographic data, past medical history, medication use, and psychological status were collected. The endoscopist assessed DCI using the visual analogue scale. After univariate screening, predictive models were developed using multivariable logistic regression, least absolute shrinkage and selection operator (LASSO) regression, and random forest (RF) algorithms. Model performance was evaluated based on discrimination, calibration, and decision curve analysis (DCA), and results were visualized using nomograms.</p><p><strong>Results: </strong>A total of 712 patients (53.8% male; mean age 54.5 years ± 12.9 years) were included. Logistic regression analysis identified constipation [odds ratio (OR) = 2.254, 95% confidence interval (CI): 1.289-3.931], abdominal circumference (AC) (77.5-91.9 cm, OR = 1.895, 95%CI: 1.065-3.350; AC ≥ 92 cm, OR = 1.271, 95%CI: 0.730-2.188), and anxiety (OR = 1.071, 95%CI: 1.044-1.100) as predictive factors for DCI, validated by LASSO and RF methods. Model performance revealed training/validation sensitivities of 0.826/0.925, 0.924/0.868, and 1.000/0.981; specificities of 0.602/0.511, 0.510/0.562, and 0.977/0.526; and corresponding area under the receiver operating characteristic curves (AUCs) of 0.780 (0.737-0.823)/0.726 (0.654-0.799), 0.754 (0.710-0.798)/0.723 (0.656-0.791), and 1.000 (1.000-1.000)/0.754 (0.688-0.820), respectively. DCA indicated optimal net benefit within probability thresholds of 0-0.9 and 0.05-0.37. The RF model demonstrated superior diagnostic accuracy, reflected by perfect training sensitivity (1.000) and highest validation AUC (0.754), outperforming other methods in clinical applicability.</p><p><strong>Conclusion: </strong>The RF-based model exhibited superior predictive accuracy for DCI compared to multivariable logistic and LASSO regression models. This approach supports individualized preoperative optimization, enhancing colonoscopy quality through targeted risk stratification.</p>","PeriodicalId":23953,"journal":{"name":"World Journal of Gastrointestinal Endoscopy","volume":"17 7","pages":"108307"},"PeriodicalIF":1.8000,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12264806/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Journal of Gastrointestinal Endoscopy","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.4253/wjge.v17.i7.108307","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Difficulty of colonoscopy insertion (DCI) significantly affects colonoscopy effectiveness and serves as a key quality indicator. Predicting and evaluating DCI risk preoperatively is crucial for optimizing intraoperative strategies.
Aim: To evaluate the predictive performance of machine learning (ML) algorithms for DCI by comparing three modeling approaches, identify factors influencing DCI, and develop a preoperative prediction model using ML algorithms to enhance colonoscopy quality and efficiency.
Methods: This cross-sectional study enrolled 712 patients who underwent colonoscopy at a tertiary hospital between June 2020 and May 2021. Demographic data, past medical history, medication use, and psychological status were collected. The endoscopist assessed DCI using the visual analogue scale. After univariate screening, predictive models were developed using multivariable logistic regression, least absolute shrinkage and selection operator (LASSO) regression, and random forest (RF) algorithms. Model performance was evaluated based on discrimination, calibration, and decision curve analysis (DCA), and results were visualized using nomograms.
Results: A total of 712 patients (53.8% male; mean age 54.5 years ± 12.9 years) were included. Logistic regression analysis identified constipation [odds ratio (OR) = 2.254, 95% confidence interval (CI): 1.289-3.931], abdominal circumference (AC) (77.5-91.9 cm, OR = 1.895, 95%CI: 1.065-3.350; AC ≥ 92 cm, OR = 1.271, 95%CI: 0.730-2.188), and anxiety (OR = 1.071, 95%CI: 1.044-1.100) as predictive factors for DCI, validated by LASSO and RF methods. Model performance revealed training/validation sensitivities of 0.826/0.925, 0.924/0.868, and 1.000/0.981; specificities of 0.602/0.511, 0.510/0.562, and 0.977/0.526; and corresponding area under the receiver operating characteristic curves (AUCs) of 0.780 (0.737-0.823)/0.726 (0.654-0.799), 0.754 (0.710-0.798)/0.723 (0.656-0.791), and 1.000 (1.000-1.000)/0.754 (0.688-0.820), respectively. DCA indicated optimal net benefit within probability thresholds of 0-0.9 and 0.05-0.37. The RF model demonstrated superior diagnostic accuracy, reflected by perfect training sensitivity (1.000) and highest validation AUC (0.754), outperforming other methods in clinical applicability.
Conclusion: The RF-based model exhibited superior predictive accuracy for DCI compared to multivariable logistic and LASSO regression models. This approach supports individualized preoperative optimization, enhancing colonoscopy quality through targeted risk stratification.