Decision tree supported substructure prediction of metabolites from GC-MS profiles.

Jan Hummel, Nadine Strehmel, Joachim Selbig, Dirk Walther, Joachim Kopka
{"title":"Decision tree supported substructure prediction of metabolites from GC-MS profiles.","authors":"Jan Hummel,&nbsp;Nadine Strehmel,&nbsp;Joachim Selbig,&nbsp;Dirk Walther,&nbsp;Joachim Kopka","doi":"10.1007/s11306-010-0198-7","DOIUrl":null,"url":null,"abstract":"<p><p>Gas chromatography coupled to mass spectrometry (GC-MS) is one of the most widespread routine technologies applied to the large scale screening and discovery of novel metabolic biomarkers. However, currently the majority of mass spectral tags (MSTs) remains unidentified due to the lack of authenticated pure reference substances required for compound identification by GC-MS. Here, we accessed the information on reference compounds stored in the Golm Metabolome Database (GMD) to apply supervised machine learning approaches to the classification and identification of unidentified MSTs without relying on library searches. Non-annotated MSTs with mass spectral and retention index (RI) information together with data of already identified metabolites and reference substances have been archived in the GMD. Structural feature extraction was applied to sub-divide the metabolite space contained in the GMD and to define the prediction target classes. Decision tree (DT)-based prediction of the most frequent substructures based on mass spectral features and RI information is demonstrated to result in highly sensitive and specific detections of sub-structures contained in the compounds. The underlying set of DTs can be inspected by the user and are made available for batch processing via SOAP (Simple Object Access Protocol)-based web services. The GMD mass spectral library with the integrated DTs is freely accessible for non-commercial use at http://gmd.mpimp-golm.mpg.de/. All matching and structure search functionalities are available as SOAP-based web services. A XML + HTTP interface, which follows Representational State Transfer (REST) principles, facilitates read-only access to data base entities.</p>","PeriodicalId":144887,"journal":{"name":"Metabolomics : Official journal of the Metabolomic Society","volume":" ","pages":"322-333"},"PeriodicalIF":0.0000,"publicationDate":"2010-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s11306-010-0198-7","citationCount":"307","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Metabolomics : Official journal of the Metabolomic Society","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s11306-010-0198-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2010/2/16 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 307

Abstract

Gas chromatography coupled to mass spectrometry (GC-MS) is one of the most widespread routine technologies applied to the large scale screening and discovery of novel metabolic biomarkers. However, currently the majority of mass spectral tags (MSTs) remains unidentified due to the lack of authenticated pure reference substances required for compound identification by GC-MS. Here, we accessed the information on reference compounds stored in the Golm Metabolome Database (GMD) to apply supervised machine learning approaches to the classification and identification of unidentified MSTs without relying on library searches. Non-annotated MSTs with mass spectral and retention index (RI) information together with data of already identified metabolites and reference substances have been archived in the GMD. Structural feature extraction was applied to sub-divide the metabolite space contained in the GMD and to define the prediction target classes. Decision tree (DT)-based prediction of the most frequent substructures based on mass spectral features and RI information is demonstrated to result in highly sensitive and specific detections of sub-structures contained in the compounds. The underlying set of DTs can be inspected by the user and are made available for batch processing via SOAP (Simple Object Access Protocol)-based web services. The GMD mass spectral library with the integrated DTs is freely accessible for non-commercial use at http://gmd.mpimp-golm.mpg.de/. All matching and structure search functionalities are available as SOAP-based web services. A XML + HTTP interface, which follows Representational State Transfer (REST) principles, facilitates read-only access to data base entities.

Abstract Image

Abstract Image

Abstract Image

决策树支持从GC-MS谱预测代谢物的亚结构。
气相色谱-质谱联用(GC-MS)是一种广泛应用于大规模筛选和发现新型代谢生物标志物的常规技术。然而,目前大多数质谱标签(MSTs)由于缺乏经GC-MS鉴定化合物所需的经认证的纯对照物质而仍然无法确定。在这里,我们访问了存储在Golm代谢组数据库(GMD)中的参考化合物信息,应用监督机器学习方法对未识别的mst进行分类和识别,而不依赖于库检索。具有质谱和保留指数(RI)信息的未注释的mst以及已鉴定的代谢物和参考物质的数据已存档在GMD中。利用结构特征提取对GMD中包含的代谢物空间进行细分,并定义预测目标类别。基于决策树(DT)的基于质谱特征和RI信息的最常见子结构预测被证明可以对化合物中包含的子结构进行高度敏感和特异性的检测。用户可以检查底层的dtd集,并且可以通过基于SOAP(简单对象访问协议)的web服务进行批处理。GMD质谱库与集成的dt免费提供非商业用途,网址为http://gmd.mpimp-golm.mpg.de/。所有匹配和结构搜索功能都可以作为基于soap的web服务使用。遵循具象状态传输(Representational State Transfer, REST)原则的XML + HTTP接口促进了对数据库实体的只读访问。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信