Na Hyeon Yu, Daeun Shin, Ik Hee Ryu, Tae Keun Yoo, Kyungmin Koh
{"title":"使用无代码机器学习工具进行表格数据的无眼底检查的视网膜静脉闭塞风险预测:来自韩国的一项全国性横断面研究。","authors":"Na Hyeon Yu, Daeun Shin, Ik Hee Ryu, Tae Keun Yoo, Kyungmin Koh","doi":"10.1186/s12911-025-02950-8","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Retinal vein occlusion (RVO) is a leading cause of vision loss globally. Routine health check-up data-including demographic information, medical history, and laboratory test results-are commonly utilized in clinical settings for disease risk assessment. This study aimed to develop a machine learning model to predict RVO risk in the general population using such tabular health data, without requiring coding expertise or retinal imaging.</p><p><strong>Methods: </strong>We utilized data from the Korea National Health and Nutrition Examination Surveys (KNHANES) collected between 2017 and 2020 to develop the RVO prediction model, with external validation performed using independent data from KNHANES 2021. Model construction was conducted using Orange Data Mining, an open-source, code-free, component-based tool with a user-friendly interface, and Google Vertex AI. An easy-to-use oversampling function was employed to address class imbalance, enhancing the usability of the workflow. Various machine learning algorithms were trained by incorporating all features from the health check-up data in the development set. The primary outcome was the area under the receiver operating characteristic curve (AUC) for identifying RVO.</p><p><strong>Results: </strong>All machine learning training was completed without the need for coding experience. An artificial neural network (ANN) with a ReLU activation function, developed using Orange Data Mining, demonstrated superior performance, achieving an AUC of 0.856 (95% confidence interval [CI], 0.835-0.875) in internal validation and 0.784 (95% CI, 0.763-0.803) in external validation. The ANN outperformed logistic regression and Google Vertex AI models, though differences were not statistically significant in internal validation. In external validation, the ANN showed a marginally significant improvement over logistic regression (P = 0.044), with no significant difference compared to Google Vertex AI. Key predictive variables included age, household income, and blood pressure-related factors.</p><p><strong>Conclusion: </strong>This study demonstrates the feasibility of developing an accessible, cost-effective RVO risk prediction tool using health check-up data and no-code machine learning platforms. Such a tool has the potential to enhance early detection and preventive strategies in general healthcare settings, thereby improving patient outcomes.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"118"},"PeriodicalIF":3.3000,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11889835/pdf/","citationCount":"0","resultStr":"{\"title\":\"Retinal vein occlusion risk prediction without fundus examination using a no-code machine learning tool for tabular data: a nationwide cross-sectional study from South Korea.\",\"authors\":\"Na Hyeon Yu, Daeun Shin, Ik Hee Ryu, Tae Keun Yoo, Kyungmin Koh\",\"doi\":\"10.1186/s12911-025-02950-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Retinal vein occlusion (RVO) is a leading cause of vision loss globally. Routine health check-up data-including demographic information, medical history, and laboratory test results-are commonly utilized in clinical settings for disease risk assessment. This study aimed to develop a machine learning model to predict RVO risk in the general population using such tabular health data, without requiring coding expertise or retinal imaging.</p><p><strong>Methods: </strong>We utilized data from the Korea National Health and Nutrition Examination Surveys (KNHANES) collected between 2017 and 2020 to develop the RVO prediction model, with external validation performed using independent data from KNHANES 2021. Model construction was conducted using Orange Data Mining, an open-source, code-free, component-based tool with a user-friendly interface, and Google Vertex AI. An easy-to-use oversampling function was employed to address class imbalance, enhancing the usability of the workflow. Various machine learning algorithms were trained by incorporating all features from the health check-up data in the development set. The primary outcome was the area under the receiver operating characteristic curve (AUC) for identifying RVO.</p><p><strong>Results: </strong>All machine learning training was completed without the need for coding experience. An artificial neural network (ANN) with a ReLU activation function, developed using Orange Data Mining, demonstrated superior performance, achieving an AUC of 0.856 (95% confidence interval [CI], 0.835-0.875) in internal validation and 0.784 (95% CI, 0.763-0.803) in external validation. The ANN outperformed logistic regression and Google Vertex AI models, though differences were not statistically significant in internal validation. In external validation, the ANN showed a marginally significant improvement over logistic regression (P = 0.044), with no significant difference compared to Google Vertex AI. Key predictive variables included age, household income, and blood pressure-related factors.</p><p><strong>Conclusion: </strong>This study demonstrates the feasibility of developing an accessible, cost-effective RVO risk prediction tool using health check-up data and no-code machine learning platforms. Such a tool has the potential to enhance early detection and preventive strategies in general healthcare settings, thereby improving patient outcomes.</p>\",\"PeriodicalId\":9340,\"journal\":{\"name\":\"BMC Medical Informatics and Decision Making\",\"volume\":\"25 1\",\"pages\":\"118\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-03-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11889835/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Medical Informatics and Decision Making\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12911-025-02950-8\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-02950-8","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
摘要
背景:视网膜静脉阻塞(RVO)是全球视力丧失的主要原因。常规健康检查数据——包括人口统计信息、病史和实验室检测结果——通常用于临床疾病风险评估。本研究旨在开发一种机器学习模型,利用此类表格健康数据预测普通人群的RVO风险,而不需要编码专业知识或视网膜成像。方法:我们利用2017年至2020年收集的韩国国家健康和营养检查调查(KNHANES)数据来建立RVO预测模型,并使用KNHANES 2021的独立数据进行外部验证。模型构建使用Orange Data Mining和谷歌Vertex AI进行。Orange Data Mining是一个开源、无代码、基于组件的工具,具有用户友好的界面。采用了易于使用的过采样函数来解决类不平衡问题,提高了工作流的可用性。通过将健康检查数据中的所有特征合并到开发集中来训练各种机器学习算法。主要结果是用于识别RVO的受试者工作特征曲线(AUC)下的面积。结果:所有机器学习训练均在无需编码经验的情况下完成。使用Orange Data Mining开发的具有ReLU激活函数的人工神经网络(ANN)表现出优异的性能,在内部验证中AUC为0.856(95%置信区间[CI], 0.835-0.875),在外部验证中AUC为0.784(95%置信区间[CI], 0.763-0.803)。人工神经网络优于逻辑回归和谷歌顶点人工智能模型,尽管在内部验证中差异无统计学意义。在外部验证中,与逻辑回归相比,人工神经网络显示出略微显著的改善(P = 0.044),与谷歌Vertex AI相比无显著差异。主要预测变量包括年龄、家庭收入和血压相关因素。结论:本研究证明了利用健康检查数据和无代码机器学习平台开发一种可访问的、具有成本效益的RVO风险预测工具的可行性。这种工具有可能加强一般医疗保健环境中的早期发现和预防策略,从而改善患者的治疗结果。
Retinal vein occlusion risk prediction without fundus examination using a no-code machine learning tool for tabular data: a nationwide cross-sectional study from South Korea.
Background: Retinal vein occlusion (RVO) is a leading cause of vision loss globally. Routine health check-up data-including demographic information, medical history, and laboratory test results-are commonly utilized in clinical settings for disease risk assessment. This study aimed to develop a machine learning model to predict RVO risk in the general population using such tabular health data, without requiring coding expertise or retinal imaging.
Methods: We utilized data from the Korea National Health and Nutrition Examination Surveys (KNHANES) collected between 2017 and 2020 to develop the RVO prediction model, with external validation performed using independent data from KNHANES 2021. Model construction was conducted using Orange Data Mining, an open-source, code-free, component-based tool with a user-friendly interface, and Google Vertex AI. An easy-to-use oversampling function was employed to address class imbalance, enhancing the usability of the workflow. Various machine learning algorithms were trained by incorporating all features from the health check-up data in the development set. The primary outcome was the area under the receiver operating characteristic curve (AUC) for identifying RVO.
Results: All machine learning training was completed without the need for coding experience. An artificial neural network (ANN) with a ReLU activation function, developed using Orange Data Mining, demonstrated superior performance, achieving an AUC of 0.856 (95% confidence interval [CI], 0.835-0.875) in internal validation and 0.784 (95% CI, 0.763-0.803) in external validation. The ANN outperformed logistic regression and Google Vertex AI models, though differences were not statistically significant in internal validation. In external validation, the ANN showed a marginally significant improvement over logistic regression (P = 0.044), with no significant difference compared to Google Vertex AI. Key predictive variables included age, household income, and blood pressure-related factors.
Conclusion: This study demonstrates the feasibility of developing an accessible, cost-effective RVO risk prediction tool using health check-up data and no-code machine learning platforms. Such a tool has the potential to enhance early detection and preventive strategies in general healthcare settings, thereby improving patient outcomes.
期刊介绍:
BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.