Metabolic Profiling of 1H NMR Spectra in Chronic Kidney Disease with Local Predictive Modeling

M. Luck, A. Yartseva, G. Bertho, E. Thervet, P. Beaune, N. Pallet, C. Damon
{"title":"Metabolic Profiling of 1H NMR Spectra in Chronic Kidney Disease with Local Predictive Modeling","authors":"M. Luck, A. Yartseva, G. Bertho, E. Thervet, P. Beaune, N. Pallet, C. Damon","doi":"10.1109/ICMLA.2015.155","DOIUrl":null,"url":null,"abstract":"Metabolic profiling, the study of changes in the concentration of the metabolites in the organism induced by biological differences within subpopulations, has to deal with a very large amount of complex data. It therefore requires the use of powerful data processing and machine learning methods. To overcome over-fitting, a common concern in metabolic profiling where the number of features is often much larger than the number of observations, many predictive analyses combined dimension reduction techniques with multivariate predictive linear modeling. Moreover, they built a global model that identifies biomarkers predictive of the output of interest giving their overall trend variations. However, this fails to capture local biological phenomena underlying subgroups of subjects. More recently, local exploration methods based on decision trees approaches have been applied in metabolomics but they only explore random parts of the feature space. In this study, we used a supervised rule-mining algorithm that locally and exhaustively explores the feature space to predict chronic kidney disease (CDK) stages based on proton Nuclear Magnetic Resonance (1H NMR) data. From the discriminant subregions obtained with this exploration, we extracted local features and learned a L2-regularized Logistic regression (L2LR) classifier. We compared the resulting local predictive model with a standard one, combining classical univariate supervised feature selection techniques with a L2LR, and a model mixing both global and local features. Results show that the local predictive model is more powerful in terms of predictive performance than the mixed and global models. Additionally, it gives key insights into biological variations specific to subgroups of subjects.","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"1 1","pages":"176-181"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2015.155","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Metabolic profiling, the study of changes in the concentration of the metabolites in the organism induced by biological differences within subpopulations, has to deal with a very large amount of complex data. It therefore requires the use of powerful data processing and machine learning methods. To overcome over-fitting, a common concern in metabolic profiling where the number of features is often much larger than the number of observations, many predictive analyses combined dimension reduction techniques with multivariate predictive linear modeling. Moreover, they built a global model that identifies biomarkers predictive of the output of interest giving their overall trend variations. However, this fails to capture local biological phenomena underlying subgroups of subjects. More recently, local exploration methods based on decision trees approaches have been applied in metabolomics but they only explore random parts of the feature space. In this study, we used a supervised rule-mining algorithm that locally and exhaustively explores the feature space to predict chronic kidney disease (CDK) stages based on proton Nuclear Magnetic Resonance (1H NMR) data. From the discriminant subregions obtained with this exploration, we extracted local features and learned a L2-regularized Logistic regression (L2LR) classifier. We compared the resulting local predictive model with a standard one, combining classical univariate supervised feature selection techniques with a L2LR, and a model mixing both global and local features. Results show that the local predictive model is more powerful in terms of predictive performance than the mixed and global models. Additionally, it gives key insights into biological variations specific to subgroups of subjects.
用局部预测模型分析慢性肾脏疾病的1H NMR代谢谱
代谢谱是研究由亚种群内的生物学差异引起的生物体中代谢物浓度变化,必须处理非常大量的复杂数据。因此,它需要使用强大的数据处理和机器学习方法。为了克服过度拟合,代谢谱中一个常见的问题,特征的数量往往比观测的数量大得多,许多预测分析结合了降维技术和多变量预测线性建模。此外,他们建立了一个全球模型,该模型可以识别生物标志物,并根据它们的总体趋势变化预测感兴趣的输出。然而,这并没有捕捉到隐藏在被试亚群之下的局部生物现象。最近,基于决策树方法的局部探索方法已应用于代谢组学,但它们只探索特征空间的随机部分。在这项研究中,我们使用了一种监督规则挖掘算法,该算法局部和详尽地探索特征空间,以质子核磁共振(1H NMR)数据为基础预测慢性肾脏疾病(CDK)的分期。从此探索获得的判别子区域中,我们提取了局部特征并学习了l2正则化逻辑回归(L2LR)分类器。我们将得到的局部预测模型与标准模型进行了比较,将经典的单变量监督特征选择技术与L2LR相结合,以及混合了全局和局部特征的模型。结果表明,局部预测模型的预测性能优于混合模型和全局模型。此外,它提供了关键的见解,具体到生物变异的亚组科目。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信