Development of transient ischemic attack risk prediction model suitable for initializing a learning health system unit using electronic medical records.

IF 3.8 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making Pub Date : 2024-12-18 DOI:10.1186/s12911-024-02767-x

Jian Wen, Tianmei Zhang, Shangrong Ye, Cheng Li, Ruobing Han, Ran Huang, Bairong Shen, Anjun Chen, Qinghua Li

{"title":"Development of transient ischemic attack risk prediction model suitable for initializing a learning health system unit using electronic medical records.","authors":"Jian Wen, Tianmei Zhang, Shangrong Ye, Cheng Li, Ruobing Han, Ran Huang, Bairong Shen, Anjun Chen, Qinghua Li","doi":"10.1186/s12911-024-02767-x","DOIUrl":null,"url":null,"abstract":"Background: Patients with transient ischemic attack (TIA) face a significantly increased risk of stroke. However, TIA screening and early detection rates are low, especially in developing countries. This study aims to develop an inclusive and practical TIA risk prediction model using machine learning (ML) that performs well in both hospital and resource-limited clinic settings. This model is essential for initiating the first ML-enabled learning health system (LHS) unit designed for routine and equitable TIA screening and early detection across broad populations.Methods: Employing a novel protocol, this study first standardized data from a hospital's electronic medical records (EMR) to construct inclusive TIA risk prediction ML models using a data-centric approach. Subsequently, a quantitative distribution of TIA risk factors was applied in feature engineering to reduce the number of variables for a practical ML model. This refined model initiated a TIA ML-LHS unit that is capable of continuously updating with new EMR data from hospitals and clinics. Additionally, the practical model underwent external validation using data from another hospital.Results: The inclusive 150-variable ML models, derived from all available EMR variables for TIA, achieved a recall of 0.868 and an accuracy of 0.886 in predicting TIA risk. Further feature engineering produced a practical XGBoost model with 20 variables, maintaining acceptable performance of 0.855 recall and 0.796 accuracy. The initialized TIA ML-LHS unit, based on the practical model, achieved performance metrics of 0.830 recall, 0.726 precision, 0.816 ROC-AUC, and 0.812 accuracy. The model also performed well in external validation, confirming its effectiveness with patient data from different clinical settings.Conclusions: This study developed the first inclusive and practical TIA XGBoost model from full hospital EHR and initiated the first TIA risk prediction ML-LHS unit. This TIA model, which requires only 20 variables, enables the ML-LHS to serve not only patients in hospitals but also those in resource-limited clinics. These results have significant implications for expanding risk-based TIA screening in community and rural clinics, thereby enhancing early detection of TIA among underserved populations and improving health equity. The novel protocol used in this study is also applicable for initiating ML-LHS units for various preventable diseases, providing a new system-level approach to responsible AI development and applications.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"24 1","pages":"392"},"PeriodicalIF":3.8000,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11657208/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-024-02767-x","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Patients with transient ischemic attack (TIA) face a significantly increased risk of stroke. However, TIA screening and early detection rates are low, especially in developing countries. This study aims to develop an inclusive and practical TIA risk prediction model using machine learning (ML) that performs well in both hospital and resource-limited clinic settings. This model is essential for initiating the first ML-enabled learning health system (LHS) unit designed for routine and equitable TIA screening and early detection across broad populations.

Methods: Employing a novel protocol, this study first standardized data from a hospital's electronic medical records (EMR) to construct inclusive TIA risk prediction ML models using a data-centric approach. Subsequently, a quantitative distribution of TIA risk factors was applied in feature engineering to reduce the number of variables for a practical ML model. This refined model initiated a TIA ML-LHS unit that is capable of continuously updating with new EMR data from hospitals and clinics. Additionally, the practical model underwent external validation using data from another hospital.

Results: The inclusive 150-variable ML models, derived from all available EMR variables for TIA, achieved a recall of 0.868 and an accuracy of 0.886 in predicting TIA risk. Further feature engineering produced a practical XGBoost model with 20 variables, maintaining acceptable performance of 0.855 recall and 0.796 accuracy. The initialized TIA ML-LHS unit, based on the practical model, achieved performance metrics of 0.830 recall, 0.726 precision, 0.816 ROC-AUC, and 0.812 accuracy. The model also performed well in external validation, confirming its effectiveness with patient data from different clinical settings.

Conclusions: This study developed the first inclusive and practical TIA XGBoost model from full hospital EHR and initiated the first TIA risk prediction ML-LHS unit. This TIA model, which requires only 20 variables, enables the ML-LHS to serve not only patients in hospitals but also those in resource-limited clinics. These results have significant implications for expanding risk-based TIA screening in community and rural clinics, thereby enhancing early detection of TIA among underserved populations and improving health equity. The novel protocol used in this study is also applicable for initiating ML-LHS units for various preventable diseases, providing a new system-level approach to responsible AI development and applications.

Abstract Image

查看原文本刊更多论文

开发一种适用于使用电子病历初始化学习型卫生系统单元的短暂性脑缺血发作风险预测模型。

背景：短暂性脑缺血发作（TIA）患者卒中风险显著增加。然而，TIA筛查和早期检出率很低，特别是在发展中国家。本研究旨在利用机器学习（ML）开发一个具有包容性和实用性的TIA风险预测模型，该模型在医院和资源有限的诊所环境中都表现良好。该模型对于启动第一个基于机器学习的学习卫生系统（LHS）单元至关重要，该单元旨在在广泛人群中进行常规和公平的TIA筛查和早期发现。方法：采用一种新的协议，本研究首先对医院电子病历（EMR）的数据进行标准化，使用以数据为中心的方法构建包容性TIA风险预测ML模型。随后，将TIA风险因素的定量分布应用于特征工程中，以减少实际ML模型的变量数量。这个改进的模型启动了TIA ML-LHS单元，该单元能够不断更新来自医院和诊所的新电子病历数据。此外，该实用模型还使用另一家医院的数据进行了外部验证。结果：包含150个变量的ML模型，来自所有可用的TIA EMR变量，在预测TIA风险方面达到了0.868的召回率和0.886的准确率。进一步的特征工程产生了具有20个变量的实用XGBoost模型，保持了0.855召回率和0.796准确率的可接受性能。基于实际模型，初始化的TIA ML-LHS单元的召回率为0.830，精度为0.726，ROC-AUC为0.816，准确率为0.812。该模型在外部验证中也表现良好，通过不同临床环境的患者数据证实了其有效性。结论：本研究建立了全院EHR中第一个具有包容性和实用性的TIA XGBoost模型，并启动了第一个TIA风险预测ML-LHS单元。这个TIA模型只需要20个变量，使ML-LHS不仅可以为医院的病人服务，也可以为资源有限的诊所的病人服务。这些结果对于在社区和农村诊所扩大基于风险的TIA筛查具有重要意义，从而在服务不足的人群中加强TIA的早期发现，并改善卫生公平。本研究中使用的新协议也适用于为各种可预防疾病启动ML-LHS单元，为负责任的人工智能开发和应用提供了新的系统级方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Medical Informatics and Decision Making 医学-医学：信息

CiteScore

7.20

自引率

5.70%

发文量

297

审稿时长

1 months

期刊介绍： BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.