Mohammad R Salmanpour, Mojtaba Shamsaei, Ghasem Hajianfar, Hamid Soltanian-Zadeh, Arman Rahmim
{"title":"利用放射组学和混合机器学习进行帕金森病进展的纵向聚类分析和预测。","authors":"Mohammad R Salmanpour, Mojtaba Shamsaei, Ghasem Hajianfar, Hamid Soltanian-Zadeh, Arman Rahmim","doi":"10.21037/qims-21-425","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>We employed machine learning approaches to (I) determine distinct progression trajectories in Parkinson's disease (PD) (unsupervised clustering task), and (II) predict progression trajectories (supervised prediction task), from early (years 0 and 1) data, making use of clinical and imaging features.</p><p><strong>Methods: </strong>We studied PD-subjects derived from longitudinal datasets (years 0, 1, 2 & 4; Parkinson's Progressive Marker Initiative). We extracted and analyzed 981 features, including motor, non-motor, and radiomics features extracted for each region-of-interest (ROIs: left/right caudate and putamen) using our standardized standardized environment for radiomics analysis (SERA) radiomics software. Segmentation of ROIs on dopamine transposer - single photon emission computed tomography (DAT SPECT) images were performed via magnetic resonance images (MRI). After performing cross-sectional clustering on 885 subjects (original dataset) to identify disease subtypes, we identified optimal longitudinal trajectories using hybrid machine learning systems (HMLS), including principal component analysis (PCA) + K-Means algorithms (KMA) followed by Bayesian information criterion (BIC), Calinski-Harabatz criterion (CHC), and elbow criterion (EC). Subsequently, prediction of the identified trajectories from early year data was performed using multiple HMLSs including 16 Dimension Reduction Algorithms (DRA) and 10 classification algorithms.</p><p><strong>Results: </strong>We identified 3 distinct progression trajectories. Hotelling's t squared test (HTST) showed that the identified trajectories were distinct. The trajectories included those with (I, II) disease escalation (2 trajectories, 27% and 38% of patients) and (III) stable disease (1 trajectory, 35% of patients). For trajectory prediction from early year data, HMLSs including the stochastic neighbor embedding algorithm (SNEA, as a DRA) as well as locally linear embedding algorithm (LLEA, as a DRA), linked with the new probabilistic neural network classifier (NPNNC, as a classifier), resulted in accuracies of 78.4% and 79.2% respectively, while other HMLSs such as SNEA + Lib_SVM (library for support vector machines) and t_SNE (t-distributed stochastic neighbor embedding) + NPNNC resulted in 76.5% and 76.1% respectively.</p><p><strong>Conclusions: </strong>This study moves beyond cross-sectional PD subtyping to clustering of longitudinal disease trajectories. We conclude that combining medical information with SPECT-based radiomics features, and optimal utilization of HMLSs, can identify distinct disease trajectories in PD patients, and enable effective prediction of disease trajectories from early year data.</p>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":" ","pages":"906-919"},"PeriodicalIF":4.6000,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8739095/pdf/qims-12-02-906.pdf","citationCount":"15","resultStr":"{\"title\":\"Longitudinal clustering analysis and prediction of Parkinson's disease progression using radiomics and hybrid machine learning.\",\"authors\":\"Mohammad R Salmanpour, Mojtaba Shamsaei, Ghasem Hajianfar, Hamid Soltanian-Zadeh, Arman Rahmim\",\"doi\":\"10.21037/qims-21-425\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>We employed machine learning approaches to (I) determine distinct progression trajectories in Parkinson's disease (PD) (unsupervised clustering task), and (II) predict progression trajectories (supervised prediction task), from early (years 0 and 1) data, making use of clinical and imaging features.</p><p><strong>Methods: </strong>We studied PD-subjects derived from longitudinal datasets (years 0, 1, 2 & 4; Parkinson's Progressive Marker Initiative). We extracted and analyzed 981 features, including motor, non-motor, and radiomics features extracted for each region-of-interest (ROIs: left/right caudate and putamen) using our standardized standardized environment for radiomics analysis (SERA) radiomics software. Segmentation of ROIs on dopamine transposer - single photon emission computed tomography (DAT SPECT) images were performed via magnetic resonance images (MRI). After performing cross-sectional clustering on 885 subjects (original dataset) to identify disease subtypes, we identified optimal longitudinal trajectories using hybrid machine learning systems (HMLS), including principal component analysis (PCA) + K-Means algorithms (KMA) followed by Bayesian information criterion (BIC), Calinski-Harabatz criterion (CHC), and elbow criterion (EC). Subsequently, prediction of the identified trajectories from early year data was performed using multiple HMLSs including 16 Dimension Reduction Algorithms (DRA) and 10 classification algorithms.</p><p><strong>Results: </strong>We identified 3 distinct progression trajectories. Hotelling's t squared test (HTST) showed that the identified trajectories were distinct. The trajectories included those with (I, II) disease escalation (2 trajectories, 27% and 38% of patients) and (III) stable disease (1 trajectory, 35% of patients). For trajectory prediction from early year data, HMLSs including the stochastic neighbor embedding algorithm (SNEA, as a DRA) as well as locally linear embedding algorithm (LLEA, as a DRA), linked with the new probabilistic neural network classifier (NPNNC, as a classifier), resulted in accuracies of 78.4% and 79.2% respectively, while other HMLSs such as SNEA + Lib_SVM (library for support vector machines) and t_SNE (t-distributed stochastic neighbor embedding) + NPNNC resulted in 76.5% and 76.1% respectively.</p><p><strong>Conclusions: </strong>This study moves beyond cross-sectional PD subtyping to clustering of longitudinal disease trajectories. We conclude that combining medical information with SPECT-based radiomics features, and optimal utilization of HMLSs, can identify distinct disease trajectories in PD patients, and enable effective prediction of disease trajectories from early year data.</p>\",\"PeriodicalId\":2,\"journal\":{\"name\":\"ACS Applied Bio Materials\",\"volume\":\" \",\"pages\":\"906-919\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2022-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8739095/pdf/qims-12-02-906.pdf\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Bio Materials\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.21037/qims-21-425\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATERIALS SCIENCE, BIOMATERIALS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21037/qims-21-425","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 15
摘要
背景:我们采用机器学习方法(I)确定帕金森病(PD)的不同进展轨迹(无监督聚类任务),以及(II)利用临床和影像学特征,从早期(0年和1年)数据预测进展轨迹(监督预测任务)。方法:我们研究了来自纵向数据集的pd受试者(0、1、2和4年;帕金森进行性标志物倡议)。我们使用我们的放射组学分析标准化环境(SERA)放射组学软件提取并分析了981个特征,包括为每个感兴趣区域(roi:左/右尾状核和壳核)提取的运动、非运动和放射组学特征。通过磁共振成像(MRI)对多巴胺转座单光子发射计算机断层扫描(DAT SPECT)图像进行roi分割。在对885名受试者(原始数据集)进行横断面聚类以确定疾病亚型后,我们使用混合机器学习系统(HMLS)确定了最佳纵向轨迹,包括主成分分析(PCA) + k -均值算法(KMA),然后是贝叶斯信息准则(BIC)、Calinski-Harabatz准则(CHC)和肘部准则(EC)。随后,使用包括16种降维算法(DRA)和10种分类算法在内的多种HMLSs对从年初数据中识别出的轨迹进行预测。结果:我们确定了3种不同的进展轨迹。霍特林的t平方检验(HTST)显示,识别的轨迹是明显的。这些轨迹包括(I, II)疾病升级(2个轨迹,27%和38%的患者)和(III)疾病稳定(1个轨迹,35%的患者)。对于年初数据的轨迹预测,包括随机邻居嵌入算法(SNEA,作为DRA)和局部线性嵌入算法(LLEA,作为DRA)在内的HMLSs与新型概率神经网络分类器(NPNNC,作为分类器)相结合,准确率分别为78.4%和79.2%。而sna + Lib_SVM(支持向量机库)和t_SNE (t分布随机邻居嵌入)+ NPNNC等hmls的准确率分别为76.5%和76.1%。结论:这项研究超越了横断面PD亚型,转向了纵向疾病轨迹的聚类。我们得出结论,将医学信息与基于spect的放射组学特征相结合,并优化利用HMLSs,可以识别PD患者不同的疾病轨迹,并能够从早期数据中有效预测疾病轨迹。
Longitudinal clustering analysis and prediction of Parkinson's disease progression using radiomics and hybrid machine learning.
Background: We employed machine learning approaches to (I) determine distinct progression trajectories in Parkinson's disease (PD) (unsupervised clustering task), and (II) predict progression trajectories (supervised prediction task), from early (years 0 and 1) data, making use of clinical and imaging features.
Methods: We studied PD-subjects derived from longitudinal datasets (years 0, 1, 2 & 4; Parkinson's Progressive Marker Initiative). We extracted and analyzed 981 features, including motor, non-motor, and radiomics features extracted for each region-of-interest (ROIs: left/right caudate and putamen) using our standardized standardized environment for radiomics analysis (SERA) radiomics software. Segmentation of ROIs on dopamine transposer - single photon emission computed tomography (DAT SPECT) images were performed via magnetic resonance images (MRI). After performing cross-sectional clustering on 885 subjects (original dataset) to identify disease subtypes, we identified optimal longitudinal trajectories using hybrid machine learning systems (HMLS), including principal component analysis (PCA) + K-Means algorithms (KMA) followed by Bayesian information criterion (BIC), Calinski-Harabatz criterion (CHC), and elbow criterion (EC). Subsequently, prediction of the identified trajectories from early year data was performed using multiple HMLSs including 16 Dimension Reduction Algorithms (DRA) and 10 classification algorithms.
Results: We identified 3 distinct progression trajectories. Hotelling's t squared test (HTST) showed that the identified trajectories were distinct. The trajectories included those with (I, II) disease escalation (2 trajectories, 27% and 38% of patients) and (III) stable disease (1 trajectory, 35% of patients). For trajectory prediction from early year data, HMLSs including the stochastic neighbor embedding algorithm (SNEA, as a DRA) as well as locally linear embedding algorithm (LLEA, as a DRA), linked with the new probabilistic neural network classifier (NPNNC, as a classifier), resulted in accuracies of 78.4% and 79.2% respectively, while other HMLSs such as SNEA + Lib_SVM (library for support vector machines) and t_SNE (t-distributed stochastic neighbor embedding) + NPNNC resulted in 76.5% and 76.1% respectively.
Conclusions: This study moves beyond cross-sectional PD subtyping to clustering of longitudinal disease trajectories. We conclude that combining medical information with SPECT-based radiomics features, and optimal utilization of HMLSs, can identify distinct disease trajectories in PD patients, and enable effective prediction of disease trajectories from early year data.
期刊介绍:
ACS Applied Bio Materials is an interdisciplinary journal publishing original research covering all aspects of biomaterials and biointerfaces including and beyond the traditional biosensing, biomedical and therapeutic applications.
The journal is devoted to reports of new and original experimental and theoretical research of an applied nature that integrates knowledge in the areas of materials, engineering, physics, bioscience, and chemistry into important bio applications. The journal is specifically interested in work that addresses the relationship between structure and function and assesses the stability and degradation of materials under relevant environmental and biological conditions.