基于通用数据模型的机器学习用于选择脑小血管疾病成像亚组的重要临床标记物

IF 6.6 1区 计算机科学 Q1 Multidisciplinary
Lan Lan;Guoliang Hu;Rui Li;Tingting Wang;Lingling Jiang;Jiawei Luo;Zhiwei Ji;Yilong Wang
{"title":"基于通用数据模型的机器学习用于选择脑小血管疾病成像亚组的重要临床标记物","authors":"Lan Lan;Guoliang Hu;Rui Li;Tingting Wang;Lingling Jiang;Jiawei Luo;Zhiwei Ji;Yilong Wang","doi":"10.26599/TST.2023.9010092","DOIUrl":null,"url":null,"abstract":"Differences in the imaging subgroups of cerebral small vessel disease (CSVD) need to be further explored. First, we use propensity score matching to obtain balanced datasets. Then random forest (RF) is adopted to classify the subgroups compared with support vector machine (SVM) and extreme gradient boosting (XGBoost), and to select the features. The top 10 important features are included in the stepwise logistic regression, and the odds ratio (OR) and 95% confidence interval (Cl) are obtained. There are 41 290 adult inpatient records diagnosed with CSVD. Accuracy and area under curve (AUC) of RF are close to 0.7, which performs best in classification compared to SVM and XGBoost. OR and 95% Cl of hematocrit for white matter lesions (WMLs), lacunes, microbleeds, atrophy, and enlarged perivascular space (EPVS) are 0.9875 (0.9857–0.9893), 0.9728 (0.9705–0.9752), 0.9782 (0.9740–0.9824), 1.0093 (1.0081–1.0106), and 0.9716 (0.9597–0.9832). OR and 95% Cl of red cell distribution width for WMLs, lacunes, atrophy, and EPVS are 0.9600 (0.9538–0.9662), 0.9630 (0.9559–0.9702), 1.0751 (1.0686–1.0817), and 0.9304 (0.8864–0.9755). OR and 95% Cl of platelet distribution width for WMLs, lacunes, and microbleeds are 1.1796 (1.1636–1.1958), 1.1663 (1.1476–1.1853), and 1.0416 (1.0152–1.0687). This study proposes a new analytical framework to select important clinical markers for CSVD with machine learning based on a common data model, which has low cost, fast speed, large sample size, and continuous data sources.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":null,"pages":null},"PeriodicalIF":6.6000,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10517920","citationCount":"0","resultStr":"{\"title\":\"Machine Learning for Selecting Important Clinical Markers of Imaging Subgroups of Cerebral Small Vessel Disease Based on a Common Data Model\",\"authors\":\"Lan Lan;Guoliang Hu;Rui Li;Tingting Wang;Lingling Jiang;Jiawei Luo;Zhiwei Ji;Yilong Wang\",\"doi\":\"10.26599/TST.2023.9010092\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Differences in the imaging subgroups of cerebral small vessel disease (CSVD) need to be further explored. First, we use propensity score matching to obtain balanced datasets. Then random forest (RF) is adopted to classify the subgroups compared with support vector machine (SVM) and extreme gradient boosting (XGBoost), and to select the features. The top 10 important features are included in the stepwise logistic regression, and the odds ratio (OR) and 95% confidence interval (Cl) are obtained. There are 41 290 adult inpatient records diagnosed with CSVD. Accuracy and area under curve (AUC) of RF are close to 0.7, which performs best in classification compared to SVM and XGBoost. OR and 95% Cl of hematocrit for white matter lesions (WMLs), lacunes, microbleeds, atrophy, and enlarged perivascular space (EPVS) are 0.9875 (0.9857–0.9893), 0.9728 (0.9705–0.9752), 0.9782 (0.9740–0.9824), 1.0093 (1.0081–1.0106), and 0.9716 (0.9597–0.9832). OR and 95% Cl of red cell distribution width for WMLs, lacunes, atrophy, and EPVS are 0.9600 (0.9538–0.9662), 0.9630 (0.9559–0.9702), 1.0751 (1.0686–1.0817), and 0.9304 (0.8864–0.9755). OR and 95% Cl of platelet distribution width for WMLs, lacunes, and microbleeds are 1.1796 (1.1636–1.1958), 1.1663 (1.1476–1.1853), and 1.0416 (1.0152–1.0687). This study proposes a new analytical framework to select important clinical markers for CSVD with machine learning based on a common data model, which has low cost, fast speed, large sample size, and continuous data sources.\",\"PeriodicalId\":48690,\"journal\":{\"name\":\"Tsinghua Science and Technology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":6.6000,\"publicationDate\":\"2024-03-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10517920\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Tsinghua Science and Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10517920/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Multidisciplinary\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Tsinghua Science and Technology","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10517920/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Multidisciplinary","Score":null,"Total":0}
引用次数: 0

摘要

脑小血管病(CSVD)成像亚组的差异需要进一步探讨。首先,我们使用倾向得分匹配法获得平衡数据集。然后,与支持向量机(SVM)和极梯度提升(XGBoost)相比,采用随机森林(RF)对亚组进行分类,并选择特征。将前 10 个重要特征纳入逐步逻辑回归,并得出几率比(OR)和 95% 的置信区间(Cl)。诊断为 CSVD 的成人住院病历有 41 290 份。与 SVM 和 XGBoost 相比,RF 的准确率和曲线下面积(AUC)接近 0.7,在分类方面表现最佳。白质病变(WMLs)、裂隙、微出血、萎缩和血管周围间隙扩大(EPVS)的血细胞比容的 OR 和 95% Cl 分别为 0.9875 (0.9857-0.9893)、0.9728 (0.9705-0.9752)、0.9782 (0.9740-0.9824)、1.0093 (1.0081-1.0106) 和 0.9716 (0.9597-0.9832)。WMLs、裂隙、萎缩和EPVS的红细胞分布宽度的OR和95% Cl分别为0.9600(0.9538-0.9662)、0.9630(0.9559-0.9702)、1.0751(1.0686-1.0817)和0.9304(0.8864-0.9755)。WMLs、裂隙和微出血的血小板分布宽度的OR和95% Cl分别为1.1796(1.1636-1.1958)、1.1663(1.1476-1.1853)和1.0416(1.0152-1.0687)。本研究提出了一种新的分析框架,利用基于通用数据模型的机器学习来选择 CSVD 的重要临床标记物,该框架具有成本低、速度快、样本量大、数据来源连续等特点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Machine Learning for Selecting Important Clinical Markers of Imaging Subgroups of Cerebral Small Vessel Disease Based on a Common Data Model
Differences in the imaging subgroups of cerebral small vessel disease (CSVD) need to be further explored. First, we use propensity score matching to obtain balanced datasets. Then random forest (RF) is adopted to classify the subgroups compared with support vector machine (SVM) and extreme gradient boosting (XGBoost), and to select the features. The top 10 important features are included in the stepwise logistic regression, and the odds ratio (OR) and 95% confidence interval (Cl) are obtained. There are 41 290 adult inpatient records diagnosed with CSVD. Accuracy and area under curve (AUC) of RF are close to 0.7, which performs best in classification compared to SVM and XGBoost. OR and 95% Cl of hematocrit for white matter lesions (WMLs), lacunes, microbleeds, atrophy, and enlarged perivascular space (EPVS) are 0.9875 (0.9857–0.9893), 0.9728 (0.9705–0.9752), 0.9782 (0.9740–0.9824), 1.0093 (1.0081–1.0106), and 0.9716 (0.9597–0.9832). OR and 95% Cl of red cell distribution width for WMLs, lacunes, atrophy, and EPVS are 0.9600 (0.9538–0.9662), 0.9630 (0.9559–0.9702), 1.0751 (1.0686–1.0817), and 0.9304 (0.8864–0.9755). OR and 95% Cl of platelet distribution width for WMLs, lacunes, and microbleeds are 1.1796 (1.1636–1.1958), 1.1663 (1.1476–1.1853), and 1.0416 (1.0152–1.0687). This study proposes a new analytical framework to select important clinical markers for CSVD with machine learning based on a common data model, which has low cost, fast speed, large sample size, and continuous data sources.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Tsinghua Science and Technology
Tsinghua Science and Technology COMPUTER SCIENCE, INFORMATION SYSTEMSCOMPU-COMPUTER SCIENCE, SOFTWARE ENGINEERING
CiteScore
10.20
自引率
10.60%
发文量
2340
期刊介绍: Tsinghua Science and Technology (Tsinghua Sci Technol) started publication in 1996. It is an international academic journal sponsored by Tsinghua University and is published bimonthly. This journal aims at presenting the up-to-date scientific achievements in computer science, electronic engineering, and other IT fields. Contributions all over the world are welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信