Association between serum multi-protein biomarker profile and real-world disability in multiple sclerosis

Wen Zhu, Chenyi Chen, Lili Zhang, Tammy Hoyt, Elizabeth Walker, Shruthi Venkatesh, Fujun Zhang, Ferhan Qureshi, John F Foley, Zongqi Xia
{"title":"Association between serum multi-protein biomarker profile and real-world disability in multiple sclerosis","authors":"Wen Zhu, Chenyi Chen, Lili Zhang, Tammy Hoyt, Elizabeth Walker, Shruthi Venkatesh, Fujun Zhang, Ferhan Qureshi, John F Foley, Zongqi Xia","doi":"10.1093/braincomms/fcad300","DOIUrl":null,"url":null,"abstract":"Abstract Few studies examined blood biomarkers informative of patient-reported outcome (PRO) of disability in people with multiple sclerosis (MS). We examined the associations between serum multi-protein biomarker profiles and patient-reported disability. In this cross-sectional study (2017-2020), adults with diagnosis of MS (or precursors) from two independent clinic-based cohorts were divided into a training and test set. For predictors, we examined 7 clinical factors (age at sample collection, sex, race/ethnicity, disease subtype, disease duration, disease-modifying therapy [DMT], and time interval between sample collection and closest PRO assessment) and 19 serum protein biomarkers potentially associated with MS disease activity endpoints identified from prior studies. We trained machine learning (ML) models (Least Absolute Shrinkage and Selection Operator [LASSO] regression, Random Forest, Extreme Gradient Boosting, Support-Vector Machines, stacking ensemble learning, and stacking classification) for predicting Patient Determined Disease Steps (PDDS) score as the primary endpoint and reported model performance using the held-out testing set. The study included 431 participants (mean age 49 years, 81% women, 94% non-Hispanic White). For binary PDDS score, combined feature input of routine clinical factors and the 19 proteins consistently outperformed base models (comprising clinical features alone or clinical features plus one single protein at a time) in predicting severe (PDDS ≥ 4) versus mild/moderate (PDDS < 4) disability across multiple ML approaches, with LASSO achieving the best area under the curve (AUCPDDS = 0.91) and other metrics. For ordinal PDDS score, LASSO models comprising combined clinical factors and 19 proteins as feature input (R2PDDS = 0.31) again outperformed base models. The two best-performing LASSO models (i.e., binary and ordinal PDDS) shared 6 clinical features (age, sex, race/ethnicity, disease subtype, disease duration, DMT efficacy) and 9 proteins (cluster of differentiation 6, CUB-domain-containing protein 1, contactin-2, interleukin-12 subunit-beta, neurofilament light chain [NfL], protogenin, serpin family A member 9, tumor necrosis factor superfamily member 13B, versican). By comparison, LASSO models with clinical features plus one single protein at a time as feature input did not select either NfL or glial fibrillary acidic protein (GFAP) as a final feature. Forcing either NfL or GFAP as a single protein feature into models did not improve performance beyond clinical features alone. Stacking classification model using 5 functional pathways to represent multiple proteins as meta-features implicated those involved in neuroaxonal integrity as significant contributors to predictive performance. Thus, serum multi-protein biomarker profiles improve the prediction of real-world MS disability status beyond clinical profile alone or clinical profile plus single protein biomarker, reaching clinically actionable performance.","PeriodicalId":9318,"journal":{"name":"Brain Communications","volume":"164 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Brain Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/braincomms/fcad300","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract Few studies examined blood biomarkers informative of patient-reported outcome (PRO) of disability in people with multiple sclerosis (MS). We examined the associations between serum multi-protein biomarker profiles and patient-reported disability. In this cross-sectional study (2017-2020), adults with diagnosis of MS (or precursors) from two independent clinic-based cohorts were divided into a training and test set. For predictors, we examined 7 clinical factors (age at sample collection, sex, race/ethnicity, disease subtype, disease duration, disease-modifying therapy [DMT], and time interval between sample collection and closest PRO assessment) and 19 serum protein biomarkers potentially associated with MS disease activity endpoints identified from prior studies. We trained machine learning (ML) models (Least Absolute Shrinkage and Selection Operator [LASSO] regression, Random Forest, Extreme Gradient Boosting, Support-Vector Machines, stacking ensemble learning, and stacking classification) for predicting Patient Determined Disease Steps (PDDS) score as the primary endpoint and reported model performance using the held-out testing set. The study included 431 participants (mean age 49 years, 81% women, 94% non-Hispanic White). For binary PDDS score, combined feature input of routine clinical factors and the 19 proteins consistently outperformed base models (comprising clinical features alone or clinical features plus one single protein at a time) in predicting severe (PDDS ≥ 4) versus mild/moderate (PDDS < 4) disability across multiple ML approaches, with LASSO achieving the best area under the curve (AUCPDDS = 0.91) and other metrics. For ordinal PDDS score, LASSO models comprising combined clinical factors and 19 proteins as feature input (R2PDDS = 0.31) again outperformed base models. The two best-performing LASSO models (i.e., binary and ordinal PDDS) shared 6 clinical features (age, sex, race/ethnicity, disease subtype, disease duration, DMT efficacy) and 9 proteins (cluster of differentiation 6, CUB-domain-containing protein 1, contactin-2, interleukin-12 subunit-beta, neurofilament light chain [NfL], protogenin, serpin family A member 9, tumor necrosis factor superfamily member 13B, versican). By comparison, LASSO models with clinical features plus one single protein at a time as feature input did not select either NfL or glial fibrillary acidic protein (GFAP) as a final feature. Forcing either NfL or GFAP as a single protein feature into models did not improve performance beyond clinical features alone. Stacking classification model using 5 functional pathways to represent multiple proteins as meta-features implicated those involved in neuroaxonal integrity as significant contributors to predictive performance. Thus, serum multi-protein biomarker profiles improve the prediction of real-world MS disability status beyond clinical profile alone or clinical profile plus single protein biomarker, reaching clinically actionable performance.
多发性硬化症患者血清多蛋白生物标志物与实际残疾的关系
很少有研究检测了多发性硬化症(MS)患者报告的残疾结局(PRO)的血液生物标志物。我们研究了血清多蛋白生物标志物谱与患者报告的残疾之间的关系。在这项横断面研究(2017-2020)中,来自两个独立临床队列的诊断为MS(或前体)的成年人被分为训练组和测试组。对于预测因子,我们检查了7个临床因素(样本收集时的年龄、性别、种族/民族、疾病亚型、病程、疾病修饰治疗[DMT]、样本收集和最接近PRO评估之间的时间间隔)和19个血清蛋白生物标志物,这些生物标志物可能与先前研究中确定的MS疾病活度终点相关。我们训练了机器学习(ML)模型(最小绝对收缩和选择算子[LASSO]回归、随机森林、极端梯度增强、支持向量机、堆叠集成学习和堆叠分类),用于预测患者确定的疾病步骤(PDDS)评分作为主要终点,并使用保留测试集报告模型性能。该研究包括431名参与者(平均年龄49岁,81%为女性,94%为非西班牙裔白人)。对于二元PDDS评分,常规临床因素和19种蛋白质的联合特征输入在预测严重(PDDS≥4)与轻度/中度(PDDS <4)多种ML方法的残疾,LASSO达到最佳曲线下面积(AUCPDDS = 0.91)和其他指标。对于顺序PDDS评分,以临床综合因素和19种蛋白为特征输入的LASSO模型(R2PDDS = 0.31)再次优于基础模型。两种表现最好的LASSO模型(即二元和有序PDDS)共有6个临床特征(年龄、性别、种族/民族、疾病亚型、病程、DMT疗效)和9个蛋白(分化簇6、含cub结构域蛋白1、接触蛋白2、白介素-12亚单位β、神经丝轻链[NfL]、protogenin、serpin家族A成员9、肿瘤坏死因子超家族成员13B、versican)。相比之下,每次添加一种单一蛋白作为特征输入的LASSO模型没有选择NfL或胶质原纤维酸性蛋白(GFAP)作为最终特征。将NfL或GFAP作为单一蛋白质特征植入模型中,除了临床特征外,并不能提高性能。使用5种功能通路的堆叠分类模型来表示多种蛋白质作为元特征,这表明涉及神经轴突完整性的蛋白质是预测性能的重要贡献者。因此,血清多蛋白生物标志物谱提高了对现实世界MS残疾状态的预测,超越了单独的临床特征或临床特征加上单一蛋白生物标志物,达到了临床可操作的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信