用局部预测模型分析慢性肾脏疾病的1H NMR代谢谱

Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications Pub Date : 2015-12-01 DOI:10.1109/ICMLA.2015.155

M. Luck, A. Yartseva, G. Bertho, E. Thervet, P. Beaune, N. Pallet, C. Damon

{"title":"用局部预测模型分析慢性肾脏疾病的1H NMR代谢谱","authors":"M. Luck, A. Yartseva, G. Bertho, E. Thervet, P. Beaune, N. Pallet, C. Damon","doi":"10.1109/ICMLA.2015.155","DOIUrl":null,"url":null,"abstract":"Metabolic profiling, the study of changes in the concentration of the metabolites in the organism induced by biological differences within subpopulations, has to deal with a very large amount of complex data. It therefore requires the use of powerful data processing and machine learning methods. To overcome over-fitting, a common concern in metabolic profiling where the number of features is often much larger than the number of observations, many predictive analyses combined dimension reduction techniques with multivariate predictive linear modeling. Moreover, they built a global model that identifies biomarkers predictive of the output of interest giving their overall trend variations. However, this fails to capture local biological phenomena underlying subgroups of subjects. More recently, local exploration methods based on decision trees approaches have been applied in metabolomics but they only explore random parts of the feature space. In this study, we used a supervised rule-mining algorithm that locally and exhaustively explores the feature space to predict chronic kidney disease (CDK) stages based on proton Nuclear Magnetic Resonance (1H NMR) data. From the discriminant subregions obtained with this exploration, we extracted local features and learned a L2-regularized Logistic regression (L2LR) classifier. We compared the resulting local predictive model with a standard one, combining classical univariate supervised feature selection techniques with a L2LR, and a model mixing both global and local features. Results show that the local predictive model is more powerful in terms of predictive performance than the mixed and global models. Additionally, it gives key insights into biological variations specific to subgroups of subjects.","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"1 1","pages":"176-181"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Metabolic Profiling of 1H NMR Spectra in Chronic Kidney Disease with Local Predictive Modeling\",\"authors\":\"M. Luck, A. Yartseva, G. Bertho, E. Thervet, P. Beaune, N. Pallet, C. Damon\",\"doi\":\"10.1109/ICMLA.2015.155\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Metabolic profiling, the study of changes in the concentration of the metabolites in the organism induced by biological differences within subpopulations, has to deal with a very large amount of complex data. It therefore requires the use of powerful data processing and machine learning methods. To overcome over-fitting, a common concern in metabolic profiling where the number of features is often much larger than the number of observations, many predictive analyses combined dimension reduction techniques with multivariate predictive linear modeling. Moreover, they built a global model that identifies biomarkers predictive of the output of interest giving their overall trend variations. However, this fails to capture local biological phenomena underlying subgroups of subjects. More recently, local exploration methods based on decision trees approaches have been applied in metabolomics but they only explore random parts of the feature space. In this study, we used a supervised rule-mining algorithm that locally and exhaustively explores the feature space to predict chronic kidney disease (CDK) stages based on proton Nuclear Magnetic Resonance (1H NMR) data. From the discriminant subregions obtained with this exploration, we extracted local features and learned a L2-regularized Logistic regression (L2LR) classifier. We compared the resulting local predictive model with a standard one, combining classical univariate supervised feature selection techniques with a L2LR, and a model mixing both global and local features. Results show that the local predictive model is more powerful in terms of predictive performance than the mixed and global models. Additionally, it gives key insights into biological variations specific to subgroups of subjects.\",\"PeriodicalId\":74528,\"journal\":{\"name\":\"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications\",\"volume\":\"1 1\",\"pages\":\"176-181\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2015.155\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2015.155","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

代谢谱是研究由亚种群内的生物学差异引起的生物体中代谢物浓度变化，必须处理非常大量的复杂数据。因此，它需要使用强大的数据处理和机器学习方法。为了克服过度拟合，代谢谱中一个常见的问题，特征的数量往往比观测的数量大得多，许多预测分析结合了降维技术和多变量预测线性建模。此外，他们建立了一个全球模型，该模型可以识别生物标志物，并根据它们的总体趋势变化预测感兴趣的输出。然而，这并没有捕捉到隐藏在被试亚群之下的局部生物现象。最近，基于决策树方法的局部探索方法已应用于代谢组学，但它们只探索特征空间的随机部分。在这项研究中，我们使用了一种监督规则挖掘算法，该算法局部和详尽地探索特征空间，以质子核磁共振(1H NMR)数据为基础预测慢性肾脏疾病(CDK)的分期。从此探索获得的判别子区域中，我们提取了局部特征并学习了l2正则化逻辑回归(L2LR)分类器。我们将得到的局部预测模型与标准模型进行了比较，将经典的单变量监督特征选择技术与L2LR相结合，以及混合了全局和局部特征的模型。结果表明，局部预测模型的预测性能优于混合模型和全局模型。此外，它提供了关键的见解，具体到生物变异的亚组科目。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Metabolic Profiling of 1H NMR Spectra in Chronic Kidney Disease with Local Predictive Modeling

Metabolic profiling, the study of changes in the concentration of the metabolites in the organism induced by biological differences within subpopulations, has to deal with a very large amount of complex data. It therefore requires the use of powerful data processing and machine learning methods. To overcome over-fitting, a common concern in metabolic profiling where the number of features is often much larger than the number of observations, many predictive analyses combined dimension reduction techniques with multivariate predictive linear modeling. Moreover, they built a global model that identifies biomarkers predictive of the output of interest giving their overall trend variations. However, this fails to capture local biological phenomena underlying subgroups of subjects. More recently, local exploration methods based on decision trees approaches have been applied in metabolomics but they only explore random parts of the feature space. In this study, we used a supervised rule-mining algorithm that locally and exhaustively explores the feature space to predict chronic kidney disease (CDK) stages based on proton Nuclear Magnetic Resonance (1H NMR) data. From the discriminant subregions obtained with this exploration, we extracted local features and learned a L2-regularized Logistic regression (L2LR) classifier. We compared the resulting local predictive model with a standard one, combining classical univariate supervised feature selection techniques with a L2LR, and a model mixing both global and local features. Results show that the local predictive model is more powerful in terms of predictive performance than the mixed and global models. Additionally, it gives key insights into biological variations specific to subgroups of subjects.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications

自引率

0.00%

发文量