Lan Lan;Guoliang Hu;Rui Li;Tingting Wang;Lingling Jiang;Jiawei Luo;Zhiwei Ji;Yilong Wang
{"title":"基于通用数据模型的机器学习用于选择脑小血管疾病成像亚组的重要临床标记物","authors":"Lan Lan;Guoliang Hu;Rui Li;Tingting Wang;Lingling Jiang;Jiawei Luo;Zhiwei Ji;Yilong Wang","doi":"10.26599/TST.2023.9010092","DOIUrl":null,"url":null,"abstract":"Differences in the imaging subgroups of cerebral small vessel disease (CSVD) need to be further explored. First, we use propensity score matching to obtain balanced datasets. Then random forest (RF) is adopted to classify the subgroups compared with support vector machine (SVM) and extreme gradient boosting (XGBoost), and to select the features. The top 10 important features are included in the stepwise logistic regression, and the odds ratio (OR) and 95% confidence interval (Cl) are obtained. There are 41 290 adult inpatient records diagnosed with CSVD. Accuracy and area under curve (AUC) of RF are close to 0.7, which performs best in classification compared to SVM and XGBoost. OR and 95% Cl of hematocrit for white matter lesions (WMLs), lacunes, microbleeds, atrophy, and enlarged perivascular space (EPVS) are 0.9875 (0.9857–0.9893), 0.9728 (0.9705–0.9752), 0.9782 (0.9740–0.9824), 1.0093 (1.0081–1.0106), and 0.9716 (0.9597–0.9832). OR and 95% Cl of red cell distribution width for WMLs, lacunes, atrophy, and EPVS are 0.9600 (0.9538–0.9662), 0.9630 (0.9559–0.9702), 1.0751 (1.0686–1.0817), and 0.9304 (0.8864–0.9755). OR and 95% Cl of platelet distribution width for WMLs, lacunes, and microbleeds are 1.1796 (1.1636–1.1958), 1.1663 (1.1476–1.1853), and 1.0416 (1.0152–1.0687). This study proposes a new analytical framework to select important clinical markers for CSVD with machine learning based on a common data model, which has low cost, fast speed, large sample size, and continuous data sources.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":null,"pages":null},"PeriodicalIF":6.6000,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10517920","citationCount":"0","resultStr":"{\"title\":\"Machine Learning for Selecting Important Clinical Markers of Imaging Subgroups of Cerebral Small Vessel Disease Based on a Common Data Model\",\"authors\":\"Lan Lan;Guoliang Hu;Rui Li;Tingting Wang;Lingling Jiang;Jiawei Luo;Zhiwei Ji;Yilong Wang\",\"doi\":\"10.26599/TST.2023.9010092\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Differences in the imaging subgroups of cerebral small vessel disease (CSVD) need to be further explored. First, we use propensity score matching to obtain balanced datasets. Then random forest (RF) is adopted to classify the subgroups compared with support vector machine (SVM) and extreme gradient boosting (XGBoost), and to select the features. The top 10 important features are included in the stepwise logistic regression, and the odds ratio (OR) and 95% confidence interval (Cl) are obtained. There are 41 290 adult inpatient records diagnosed with CSVD. Accuracy and area under curve (AUC) of RF are close to 0.7, which performs best in classification compared to SVM and XGBoost. OR and 95% Cl of hematocrit for white matter lesions (WMLs), lacunes, microbleeds, atrophy, and enlarged perivascular space (EPVS) are 0.9875 (0.9857–0.9893), 0.9728 (0.9705–0.9752), 0.9782 (0.9740–0.9824), 1.0093 (1.0081–1.0106), and 0.9716 (0.9597–0.9832). OR and 95% Cl of red cell distribution width for WMLs, lacunes, atrophy, and EPVS are 0.9600 (0.9538–0.9662), 0.9630 (0.9559–0.9702), 1.0751 (1.0686–1.0817), and 0.9304 (0.8864–0.9755). OR and 95% Cl of platelet distribution width for WMLs, lacunes, and microbleeds are 1.1796 (1.1636–1.1958), 1.1663 (1.1476–1.1853), and 1.0416 (1.0152–1.0687). This study proposes a new analytical framework to select important clinical markers for CSVD with machine learning based on a common data model, which has low cost, fast speed, large sample size, and continuous data sources.\",\"PeriodicalId\":48690,\"journal\":{\"name\":\"Tsinghua Science and Technology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":6.6000,\"publicationDate\":\"2024-03-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10517920\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Tsinghua Science and Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10517920/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Multidisciplinary\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Tsinghua Science and Technology","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10517920/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Multidisciplinary","Score":null,"Total":0}
Machine Learning for Selecting Important Clinical Markers of Imaging Subgroups of Cerebral Small Vessel Disease Based on a Common Data Model
Differences in the imaging subgroups of cerebral small vessel disease (CSVD) need to be further explored. First, we use propensity score matching to obtain balanced datasets. Then random forest (RF) is adopted to classify the subgroups compared with support vector machine (SVM) and extreme gradient boosting (XGBoost), and to select the features. The top 10 important features are included in the stepwise logistic regression, and the odds ratio (OR) and 95% confidence interval (Cl) are obtained. There are 41 290 adult inpatient records diagnosed with CSVD. Accuracy and area under curve (AUC) of RF are close to 0.7, which performs best in classification compared to SVM and XGBoost. OR and 95% Cl of hematocrit for white matter lesions (WMLs), lacunes, microbleeds, atrophy, and enlarged perivascular space (EPVS) are 0.9875 (0.9857–0.9893), 0.9728 (0.9705–0.9752), 0.9782 (0.9740–0.9824), 1.0093 (1.0081–1.0106), and 0.9716 (0.9597–0.9832). OR and 95% Cl of red cell distribution width for WMLs, lacunes, atrophy, and EPVS are 0.9600 (0.9538–0.9662), 0.9630 (0.9559–0.9702), 1.0751 (1.0686–1.0817), and 0.9304 (0.8864–0.9755). OR and 95% Cl of platelet distribution width for WMLs, lacunes, and microbleeds are 1.1796 (1.1636–1.1958), 1.1663 (1.1476–1.1853), and 1.0416 (1.0152–1.0687). This study proposes a new analytical framework to select important clinical markers for CSVD with machine learning based on a common data model, which has low cost, fast speed, large sample size, and continuous data sources.
期刊介绍:
Tsinghua Science and Technology (Tsinghua Sci Technol) started publication in 1996. It is an international academic journal sponsored by Tsinghua University and is published bimonthly. This journal aims at presenting the up-to-date scientific achievements in computer science, electronic engineering, and other IT fields. Contributions all over the world are welcome.