Predicting the Association of Metabolites with Both Pathway Categories and Individual Pathways.

IF 3.4 3区 生物学 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY
Metabolites Pub Date : 2024-09-21 DOI:10.3390/metabo14090510
Erik D Huckvale, Hunter N B Moseley
{"title":"Predicting the Association of Metabolites with Both Pathway Categories and Individual Pathways.","authors":"Erik D Huckvale, Hunter N B Moseley","doi":"10.3390/metabo14090510","DOIUrl":null,"url":null,"abstract":"<p><p>Metabolism is a network of chemical reactions that sustain cellular life. Parts of this metabolic network are defined as metabolic pathways containing specific biochemical reactions. Products and reactants of these reactions are called metabolites, which are associated with certain human-defined metabolic pathways. Metabolic knowledgebases, such as the Kyoto Encyclopedia of Gene and Genomes (KEGG) contain metabolites, reactions, and pathway annotations; however, such resources are incomplete due to current limits of metabolic knowledge. To fill in missing metabolite pathway annotations, past machine learning models showed some success at predicting the KEGG Level 2 pathway category involvement of metabolites based on their chemical structure. Here, we present the first machine learning model to predict metabolite association to more granular KEGG Level 3 metabolic pathways. We used a feature and dataset engineering approach to generate over one million metabolite-pathway entries in the dataset used to train a single binary classifier. This approach produced a mean Matthews correlation coefficient (MCC) of 0.806 ± 0.017 SD across 100 cross-validation iterations. The 172 Level 3 pathways were predicted with an overall MCC of 0.726. Moreover, metabolite association with the 12 Level 2 pathway categories was predicted with an overall MCC of 0.891, representing significant transfer learning from the Level 3 pathway entries. These are the best metabolite pathway prediction results published so far in the field.</p>","PeriodicalId":18496,"journal":{"name":"Metabolites","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11433779/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Metabolites","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3390/metabo14090510","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Metabolism is a network of chemical reactions that sustain cellular life. Parts of this metabolic network are defined as metabolic pathways containing specific biochemical reactions. Products and reactants of these reactions are called metabolites, which are associated with certain human-defined metabolic pathways. Metabolic knowledgebases, such as the Kyoto Encyclopedia of Gene and Genomes (KEGG) contain metabolites, reactions, and pathway annotations; however, such resources are incomplete due to current limits of metabolic knowledge. To fill in missing metabolite pathway annotations, past machine learning models showed some success at predicting the KEGG Level 2 pathway category involvement of metabolites based on their chemical structure. Here, we present the first machine learning model to predict metabolite association to more granular KEGG Level 3 metabolic pathways. We used a feature and dataset engineering approach to generate over one million metabolite-pathway entries in the dataset used to train a single binary classifier. This approach produced a mean Matthews correlation coefficient (MCC) of 0.806 ± 0.017 SD across 100 cross-validation iterations. The 172 Level 3 pathways were predicted with an overall MCC of 0.726. Moreover, metabolite association with the 12 Level 2 pathway categories was predicted with an overall MCC of 0.891, representing significant transfer learning from the Level 3 pathway entries. These are the best metabolite pathway prediction results published so far in the field.

预测代谢物与途径类别和单个途径的关联性
新陈代谢是维持细胞生命的化学反应网络。这个代谢网络的一部分被定义为代谢途径,其中包含特定的生化反应。这些反应的产物和反应物被称为代谢物,它们与人类定义的某些代谢途径相关联。京都基因和基因组百科全书》(KEGG)等代谢知识库包含代谢物、反应和途径注释;然而,由于目前代谢知识的局限性,这些资源并不完整。为了填补代谢物通路注释的缺失,过去的机器学习模型在根据代谢物的化学结构预测其参与的 KEGG 二级通路类别方面取得了一定的成功。在这里,我们提出了第一个机器学习模型,用于预测代谢物与更精细的 KEGG 3 级代谢途径的关联。我们采用特征和数据集工程方法,在用于训练单一二元分类器的数据集中生成了 100 多万个代谢物-途径条目。在 100 次交叉验证迭代中,这种方法产生的平均马修斯相关系数 (MCC) 为 0.806 ± 0.017 SD。预测出的 172 条三级通路的总体马修斯相关系数为 0.726。此外,代谢物与 12 个二级通路类别的关联预测总 MCC 为 0.891,表明从三级通路条目中获得了显著的迁移学习。这是迄今为止该领域发表的最好的代谢物通路预测结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Metabolites
Metabolites Biochemistry, Genetics and Molecular Biology-Molecular Biology
CiteScore
5.70
自引率
7.30%
发文量
1070
审稿时长
17.17 days
期刊介绍: Metabolites (ISSN 2218-1989) is an international, peer-reviewed open access journal of metabolism and metabolomics. Metabolites publishes original research articles and review articles in all molecular aspects of metabolism relevant to the fields of metabolomics, metabolic biochemistry, computational and systems biology, biotechnology and medicine, with a particular focus on the biological roles of metabolites and small molecule biomarkers. Metabolites encourages scientists to publish their experimental and theoretical results in as much detail as possible. Therefore, there is no restriction on article length. Sufficient experimental details must be provided to enable the results to be accurately reproduced. Electronic material representing additional figures, materials and methods explanation, or supporting results and evidence can be submitted with the main manuscript as supplementary material.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信