利用机器学习和常规临床生物标记物预测早期冠状动脉疾病,并通过增强型虚拟数据加以改进

Angela Koloi, Vasileios S. Loukas, Cillian Hourican, A. Sakellarios, Rick Quax, Pashupati P. Mishra, T. Lehtimäki, Olli T. Raitakari, C. Papaloukas, Jos A. Bosch, Winfried März, D. Fotiadis
{"title":"利用机器学习和常规临床生物标记物预测早期冠状动脉疾病,并通过增强型虚拟数据加以改进","authors":"Angela Koloi, Vasileios S. Loukas, Cillian Hourican, A. Sakellarios, Rick Quax, Pashupati P. Mishra, T. Lehtimäki, Olli T. Raitakari, C. Papaloukas, Jos A. Bosch, Winfried März, D. Fotiadis","doi":"10.1093/ehjdh/ztae049","DOIUrl":null,"url":null,"abstract":"\n \n \n Coronary artery disease (CAD) is a highly prevalent disease with modifiable risk factors. In patients with suspected obstructive CAD, evaluating the pre-test probability model is crucial for diagnosis, although its accuracy remains controversial. Machine learning (ML) predictive models can help clinicians detect CAD early and improve outcomes. This study aimed to identify early-stage CAD using ML in conjunction with a panel of clinical and laboratory tests.\n \n \n \n The study sample included 3316 patients enrolled in the Ludwigshafen Risk and Cardiovascular Health (LURIC) study. A comprehensive array of attributes was considered, and an ML pipeline was developed. Subsequently, we utilized five approaches to generating high-quality virtual patient data to improve the performance of the artificial intelligence models. An extension study was carried out using data from the Young Finns Study (YFS) to assess the results’ generalizability. Upon applying virtual augmented data, accuracy increased by approximately 5%, from 0.75 to –0.79 for random forests (RFs), and from 0.76 to –0.80 for Gradient Boosting (GB). Sensitivity showed a significant boost for RFs, rising by about 9.4% (0.81–0.89), while GB exhibited a 4.8% increase (0.83–0.87). Specificity showed a significant boost for RFs, rising by ∼24% (from 0.55 to 0.70), while GB exhibited a 37% increase (from 0.51 to 0.74). The extension analysis aligned with the initial study.\n \n \n \n Accurate predictions of angiographic CAD can be obtained using a set of routine laboratory markers, age, sex, and smoking status, holding the potential to limit the need for invasive diagnostic techniques. The extension analysis in the YFS demonstrated the potential of these findings in a younger population, and it confirmed applicability to atherosclerotic vascular disease.\n \n \n \n Using virtual population generation techniques, this study improved the accuracy of a machine learning model designed to identify early-stage CAD using standard laboratory tests.\n \n \n \n","PeriodicalId":508387,"journal":{"name":"European Heart Journal - Digital Health","volume":"2 8","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predicting early-stage coronary artery disease using machine learning and routine clinical biomarkers improved by augmented virtual data\",\"authors\":\"Angela Koloi, Vasileios S. Loukas, Cillian Hourican, A. Sakellarios, Rick Quax, Pashupati P. Mishra, T. Lehtimäki, Olli T. Raitakari, C. Papaloukas, Jos A. Bosch, Winfried März, D. Fotiadis\",\"doi\":\"10.1093/ehjdh/ztae049\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n \\n \\n Coronary artery disease (CAD) is a highly prevalent disease with modifiable risk factors. In patients with suspected obstructive CAD, evaluating the pre-test probability model is crucial for diagnosis, although its accuracy remains controversial. Machine learning (ML) predictive models can help clinicians detect CAD early and improve outcomes. This study aimed to identify early-stage CAD using ML in conjunction with a panel of clinical and laboratory tests.\\n \\n \\n \\n The study sample included 3316 patients enrolled in the Ludwigshafen Risk and Cardiovascular Health (LURIC) study. A comprehensive array of attributes was considered, and an ML pipeline was developed. Subsequently, we utilized five approaches to generating high-quality virtual patient data to improve the performance of the artificial intelligence models. An extension study was carried out using data from the Young Finns Study (YFS) to assess the results’ generalizability. Upon applying virtual augmented data, accuracy increased by approximately 5%, from 0.75 to –0.79 for random forests (RFs), and from 0.76 to –0.80 for Gradient Boosting (GB). Sensitivity showed a significant boost for RFs, rising by about 9.4% (0.81–0.89), while GB exhibited a 4.8% increase (0.83–0.87). Specificity showed a significant boost for RFs, rising by ∼24% (from 0.55 to 0.70), while GB exhibited a 37% increase (from 0.51 to 0.74). The extension analysis aligned with the initial study.\\n \\n \\n \\n Accurate predictions of angiographic CAD can be obtained using a set of routine laboratory markers, age, sex, and smoking status, holding the potential to limit the need for invasive diagnostic techniques. The extension analysis in the YFS demonstrated the potential of these findings in a younger population, and it confirmed applicability to atherosclerotic vascular disease.\\n \\n \\n \\n Using virtual population generation techniques, this study improved the accuracy of a machine learning model designed to identify early-stage CAD using standard laboratory tests.\\n \\n \\n \\n\",\"PeriodicalId\":508387,\"journal\":{\"name\":\"European Heart Journal - Digital Health\",\"volume\":\"2 8\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Heart Journal - Digital Health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/ehjdh/ztae049\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Heart Journal - Digital Health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/ehjdh/ztae049","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

冠状动脉疾病(CAD)是一种具有可改变风险因素的高发疾病。对于疑似阻塞性冠状动脉疾病患者,评估检测前概率模型对诊断至关重要,但其准确性仍存在争议。机器学习(ML)预测模型可以帮助临床医生及早发现 CAD 并改善预后。本研究旨在利用 ML 结合一系列临床和实验室检查来识别早期的 CAD。 研究样本包括参加路德维希港风险与心血管健康(LURIC)研究的 3316 名患者。我们考虑了一系列全面的属性,并开发了一个 ML 管道。随后,我们利用五种方法生成高质量的虚拟患者数据,以提高人工智能模型的性能。我们利用芬兰青年研究(YFS)的数据开展了一项扩展研究,以评估结果的可推广性。应用虚拟增强数据后,准确率提高了约 5%,随机森林(RF)的准确率从 0.75 提高到 -0.79,梯度提升(GB)的准确率从 0.76 提高到 -0.80。随机森林的灵敏度明显提高,提高了约 9.4%(0.81-0.89),而梯度提升法的灵敏度提高了 4.8%(0.83-0.87)。RFs的特异性明显提高,提高了24%(从0.55提高到0.70),而GB则提高了37%(从0.51提高到0.74)。扩展分析与最初的研究结果一致。 利用一组常规实验室指标、年龄、性别和吸烟状况就能准确预测血管造影 CAD,从而有可能限制对侵入性诊断技术的需求。在 YFS 中进行的扩展分析表明了这些研究结果在年轻人群中的应用潜力,并证实了其对动脉粥样硬化性血管疾病的适用性。 这项研究利用虚拟人群生成技术,提高了机器学习模型的准确性,该模型旨在利用标准实验室测试来识别早期的冠状动脉粥样硬化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Predicting early-stage coronary artery disease using machine learning and routine clinical biomarkers improved by augmented virtual data
Coronary artery disease (CAD) is a highly prevalent disease with modifiable risk factors. In patients with suspected obstructive CAD, evaluating the pre-test probability model is crucial for diagnosis, although its accuracy remains controversial. Machine learning (ML) predictive models can help clinicians detect CAD early and improve outcomes. This study aimed to identify early-stage CAD using ML in conjunction with a panel of clinical and laboratory tests. The study sample included 3316 patients enrolled in the Ludwigshafen Risk and Cardiovascular Health (LURIC) study. A comprehensive array of attributes was considered, and an ML pipeline was developed. Subsequently, we utilized five approaches to generating high-quality virtual patient data to improve the performance of the artificial intelligence models. An extension study was carried out using data from the Young Finns Study (YFS) to assess the results’ generalizability. Upon applying virtual augmented data, accuracy increased by approximately 5%, from 0.75 to –0.79 for random forests (RFs), and from 0.76 to –0.80 for Gradient Boosting (GB). Sensitivity showed a significant boost for RFs, rising by about 9.4% (0.81–0.89), while GB exhibited a 4.8% increase (0.83–0.87). Specificity showed a significant boost for RFs, rising by ∼24% (from 0.55 to 0.70), while GB exhibited a 37% increase (from 0.51 to 0.74). The extension analysis aligned with the initial study. Accurate predictions of angiographic CAD can be obtained using a set of routine laboratory markers, age, sex, and smoking status, holding the potential to limit the need for invasive diagnostic techniques. The extension analysis in the YFS demonstrated the potential of these findings in a younger population, and it confirmed applicability to atherosclerotic vascular disease. Using virtual population generation techniques, this study improved the accuracy of a machine learning model designed to identify early-stage CAD using standard laboratory tests.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信