基于机器学习的胃癌前病变高危患者筛查模型构建

IF 5.3 3区医学 Q1 INTEGRATIVE & COMPLEMENTARY MEDICINE

Chinese Medicine Pub Date : 2025-01-07 DOI:10.1186/s13020-025-01059-4

Shuxian Yu, Haiyang Jiang, Jing Xia, Jie Gu, Mengting Chen, Yan Wang, Xiaohong Zhao, Zehua Liao, Puhua Zeng, Tian Xie, Xinbing Sui

{"title":"基于机器学习的胃癌前病变高危患者筛查模型构建","authors":"Shuxian Yu, Haiyang Jiang, Jing Xia, Jie Gu, Mengting Chen, Yan Wang, Xiaohong Zhao, Zehua Liao, Puhua Zeng, Tian Xie, Xinbing Sui","doi":"10.1186/s13020-025-01059-4","DOIUrl":null,"url":null,"abstract":"Background: The individualized prediction and discrimination of precancerous lesions of gastric cancer (PLGC) is critical for the early prevention of gastric cancer (GC). However, accurate non-invasive methods for distinguishing between PLGC and GC are currently lacking. This study therefore aimed to develop a risk prediction model by machine learning and deep learning techniques to aid the early diagnosis of GC.Methods: In this study, a total of 2229 subjects were recruited from nine tertiary hospitals between October 2022 and November 2023. We designed a comprehensive questionnaire, identified statistically significant factors, and created a web-based column chart. Then, a risk prediction model was subsequently developed by machine learning techniques. In addition, a tongue image-based risk prediction model was established by deep learning algorithms.Results: Based on logistic regression analysis, a dynamic web-based nomogram was developed and it is freely accessible at: https://yz6677.shinyapps.io/GC67/ . Then, the prediction model was established using ten different machine learning algorithms and the Random Forest (RF) model achieved the highest accuracy at 85.65%. According with the predictive results, the top 10 key risk factors were age, traditional Chinese medicine (TCM) constitution type, tongue coating color, tongue color, irregular meals, pickled food, greasy fur, over-hot eating habit, anxiety and sleep onset latency. These factors are all significant risk indicators for the progression of PLGC patients to GC patients. Subsequently, the Swin Transformer architecture was used to develop a tongue image-based model for predicting the risk for progression of PLGC. The verification set showed an accuracy of 73.33% and an area under curve (AUC) greater than 0.8 across all models.Conclusions: Our study developed machine learning and deep learning-based models for predicting the risk for progression of PLGC to GC, which will offer the assistance to determine the high-risk patients from PLGC and improve the early diagnosis of GC.","PeriodicalId":10266,"journal":{"name":"Chinese Medicine","volume":"20 1","pages":"7"},"PeriodicalIF":5.3000,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11705657/pdf/","citationCount":"0","resultStr":"{\"title\":\"Construction of machine learning-based models for screening the high-risk patients with gastric precancerous lesions.\",\"authors\":\"Shuxian Yu, Haiyang Jiang, Jing Xia, Jie Gu, Mengting Chen, Yan Wang, Xiaohong Zhao, Zehua Liao, Puhua Zeng, Tian Xie, Xinbing Sui\",\"doi\":\"10.1186/s13020-025-01059-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: The individualized prediction and discrimination of precancerous lesions of gastric cancer (PLGC) is critical for the early prevention of gastric cancer (GC). However, accurate non-invasive methods for distinguishing between PLGC and GC are currently lacking. This study therefore aimed to develop a risk prediction model by machine learning and deep learning techniques to aid the early diagnosis of GC.Methods: In this study, a total of 2229 subjects were recruited from nine tertiary hospitals between October 2022 and November 2023. We designed a comprehensive questionnaire, identified statistically significant factors, and created a web-based column chart. Then, a risk prediction model was subsequently developed by machine learning techniques. In addition, a tongue image-based risk prediction model was established by deep learning algorithms.Results: Based on logistic regression analysis, a dynamic web-based nomogram was developed and it is freely accessible at: https://yz6677.shinyapps.io/GC67/ . Then, the prediction model was established using ten different machine learning algorithms and the Random Forest (RF) model achieved the highest accuracy at 85.65%. According with the predictive results, the top 10 key risk factors were age, traditional Chinese medicine (TCM) constitution type, tongue coating color, tongue color, irregular meals, pickled food, greasy fur, over-hot eating habit, anxiety and sleep onset latency. These factors are all significant risk indicators for the progression of PLGC patients to GC patients. Subsequently, the Swin Transformer architecture was used to develop a tongue image-based model for predicting the risk for progression of PLGC. The verification set showed an accuracy of 73.33% and an area under curve (AUC) greater than 0.8 across all models.Conclusions: Our study developed machine learning and deep learning-based models for predicting the risk for progression of PLGC to GC, which will offer the assistance to determine the high-risk patients from PLGC and improve the early diagnosis of GC.\",\"PeriodicalId\":10266,\"journal\":{\"name\":\"Chinese Medicine\",\"volume\":\"20 1\",\"pages\":\"7\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-01-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11705657/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chinese Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s13020-025-01059-4\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"INTEGRATIVE & COMPLEMENTARY MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chinese Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13020-025-01059-4","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INTEGRATIVE & COMPLEMENTARY MEDICINE","Score":null,"Total":0}

引用次数: 0

摘要

背景：胃癌癌前病变的个体化预测和鉴别是早期预防胃癌的关键。然而，目前缺乏准确的非侵入性方法来区分PLGC和GC。因此，本研究旨在通过机器学习和深度学习技术建立一种风险预测模型，以帮助GC的早期诊断。方法：本研究于2022年10月至2023年11月在9家三级医院共招募2229名受试者。我们设计了一份全面的问卷，确定了统计上显著的因素，并创建了一个基于网络的柱状图。然后，利用机器学习技术开发了风险预测模型。此外，利用深度学习算法建立了基于舌头图像的风险预测模型。结果：在逻辑回归分析的基础上，开发了一个基于web的动态模态图，并可免费访问：https://yz6677.shinyapps.io/GC67/。然后，使用10种不同的机器学习算法建立预测模型，随机森林（Random Forest， RF）模型的准确率最高，达到85.65%。根据预测结果，排名前10位的关键危险因素是年龄、中医体质类型、舌苔颜色、舌苔颜色、饮食不规律、腌制食物、皮毛油腻、饮食习惯过热、焦虑和睡眠潜伏期。这些因素都是PLGC患者发展为GC患者的重要危险指标。随后，Swin Transformer架构被用于开发基于舌头图像的模型，用于预测PLGC进展的风险。验证集的准确率为73.33%，所有模型的曲线下面积（AUC）均大于0.8。结论：本研究建立了基于机器学习和深度学习的PLGC发展为GC的风险预测模型，有助于确定PLGC的高危患者，提高GC的早期诊断。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Construction of machine learning-based models for screening the high-risk patients with gastric precancerous lesions.

Background: The individualized prediction and discrimination of precancerous lesions of gastric cancer (PLGC) is critical for the early prevention of gastric cancer (GC). However, accurate non-invasive methods for distinguishing between PLGC and GC are currently lacking. This study therefore aimed to develop a risk prediction model by machine learning and deep learning techniques to aid the early diagnosis of GC.

Methods: In this study, a total of 2229 subjects were recruited from nine tertiary hospitals between October 2022 and November 2023. We designed a comprehensive questionnaire, identified statistically significant factors, and created a web-based column chart. Then, a risk prediction model was subsequently developed by machine learning techniques. In addition, a tongue image-based risk prediction model was established by deep learning algorithms.

Results: Based on logistic regression analysis, a dynamic web-based nomogram was developed and it is freely accessible at: https://yz6677.shinyapps.io/GC67/ . Then, the prediction model was established using ten different machine learning algorithms and the Random Forest (RF) model achieved the highest accuracy at 85.65%. According with the predictive results, the top 10 key risk factors were age, traditional Chinese medicine (TCM) constitution type, tongue coating color, tongue color, irregular meals, pickled food, greasy fur, over-hot eating habit, anxiety and sleep onset latency. These factors are all significant risk indicators for the progression of PLGC patients to GC patients. Subsequently, the Swin Transformer architecture was used to develop a tongue image-based model for predicting the risk for progression of PLGC. The verification set showed an accuracy of 73.33% and an area under curve (AUC) greater than 0.8 across all models.

Conclusions: Our study developed machine learning and deep learning-based models for predicting the risk for progression of PLGC to GC, which will offer the assistance to determine the high-risk patients from PLGC and improve the early diagnosis of GC.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Chinese Medicine INTEGRATIVE & COMPLEMENTARY MEDICINE-PHARMACOLOGY & PHARMACY

CiteScore

7.90

自引率

4.10%

发文量

133

审稿时长

31 weeks

期刊介绍： Chinese Medicine is an open access, online journal publishing evidence-based, scientifically justified, and ethical research into all aspects of Chinese medicine. Areas of interest include recent advances in herbal medicine, clinical nutrition, clinical diagnosis, acupuncture, pharmaceutics, biomedical sciences, epidemiology, education, informatics, sociology, and psychology that are relevant and significant to Chinese medicine. Examples of research approaches include biomedical experimentation, high-throughput technology, clinical trials, systematic reviews, meta-analysis, sampled surveys, simulation, data curation, statistics, omics, translational medicine, and integrative methodologies. Chinese Medicine is a credible channel to communicate unbiased scientific data, information, and knowledge in Chinese medicine among researchers, clinicians, academics, and students in Chinese medicine and other scientific disciplines of medicine.