Jan Hummel, Nadine Strehmel, Joachim Selbig, Dirk Walther, Joachim Kopka
{"title":"决策树支持从GC-MS谱预测代谢物的亚结构。","authors":"Jan Hummel, Nadine Strehmel, Joachim Selbig, Dirk Walther, Joachim Kopka","doi":"10.1007/s11306-010-0198-7","DOIUrl":null,"url":null,"abstract":"<p><p>Gas chromatography coupled to mass spectrometry (GC-MS) is one of the most widespread routine technologies applied to the large scale screening and discovery of novel metabolic biomarkers. However, currently the majority of mass spectral tags (MSTs) remains unidentified due to the lack of authenticated pure reference substances required for compound identification by GC-MS. Here, we accessed the information on reference compounds stored in the Golm Metabolome Database (GMD) to apply supervised machine learning approaches to the classification and identification of unidentified MSTs without relying on library searches. Non-annotated MSTs with mass spectral and retention index (RI) information together with data of already identified metabolites and reference substances have been archived in the GMD. Structural feature extraction was applied to sub-divide the metabolite space contained in the GMD and to define the prediction target classes. Decision tree (DT)-based prediction of the most frequent substructures based on mass spectral features and RI information is demonstrated to result in highly sensitive and specific detections of sub-structures contained in the compounds. The underlying set of DTs can be inspected by the user and are made available for batch processing via SOAP (Simple Object Access Protocol)-based web services. The GMD mass spectral library with the integrated DTs is freely accessible for non-commercial use at http://gmd.mpimp-golm.mpg.de/. All matching and structure search functionalities are available as SOAP-based web services. A XML + HTTP interface, which follows Representational State Transfer (REST) principles, facilitates read-only access to data base entities.</p>","PeriodicalId":144887,"journal":{"name":"Metabolomics : Official journal of the Metabolomic Society","volume":" ","pages":"322-333"},"PeriodicalIF":0.0000,"publicationDate":"2010-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s11306-010-0198-7","citationCount":"307","resultStr":"{\"title\":\"Decision tree supported substructure prediction of metabolites from GC-MS profiles.\",\"authors\":\"Jan Hummel, Nadine Strehmel, Joachim Selbig, Dirk Walther, Joachim Kopka\",\"doi\":\"10.1007/s11306-010-0198-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Gas chromatography coupled to mass spectrometry (GC-MS) is one of the most widespread routine technologies applied to the large scale screening and discovery of novel metabolic biomarkers. However, currently the majority of mass spectral tags (MSTs) remains unidentified due to the lack of authenticated pure reference substances required for compound identification by GC-MS. Here, we accessed the information on reference compounds stored in the Golm Metabolome Database (GMD) to apply supervised machine learning approaches to the classification and identification of unidentified MSTs without relying on library searches. Non-annotated MSTs with mass spectral and retention index (RI) information together with data of already identified metabolites and reference substances have been archived in the GMD. Structural feature extraction was applied to sub-divide the metabolite space contained in the GMD and to define the prediction target classes. Decision tree (DT)-based prediction of the most frequent substructures based on mass spectral features and RI information is demonstrated to result in highly sensitive and specific detections of sub-structures contained in the compounds. The underlying set of DTs can be inspected by the user and are made available for batch processing via SOAP (Simple Object Access Protocol)-based web services. The GMD mass spectral library with the integrated DTs is freely accessible for non-commercial use at http://gmd.mpimp-golm.mpg.de/. All matching and structure search functionalities are available as SOAP-based web services. A XML + HTTP interface, which follows Representational State Transfer (REST) principles, facilitates read-only access to data base entities.</p>\",\"PeriodicalId\":144887,\"journal\":{\"name\":\"Metabolomics : Official journal of the Metabolomic Society\",\"volume\":\" \",\"pages\":\"322-333\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1007/s11306-010-0198-7\",\"citationCount\":\"307\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Metabolomics : Official journal of the Metabolomic Society\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s11306-010-0198-7\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2010/2/16 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Metabolomics : Official journal of the Metabolomic Society","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s11306-010-0198-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2010/2/16 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 307
摘要
气相色谱-质谱联用(GC-MS)是一种广泛应用于大规模筛选和发现新型代谢生物标志物的常规技术。然而,目前大多数质谱标签(MSTs)由于缺乏经GC-MS鉴定化合物所需的经认证的纯对照物质而仍然无法确定。在这里,我们访问了存储在Golm代谢组数据库(GMD)中的参考化合物信息,应用监督机器学习方法对未识别的mst进行分类和识别,而不依赖于库检索。具有质谱和保留指数(RI)信息的未注释的mst以及已鉴定的代谢物和参考物质的数据已存档在GMD中。利用结构特征提取对GMD中包含的代谢物空间进行细分,并定义预测目标类别。基于决策树(DT)的基于质谱特征和RI信息的最常见子结构预测被证明可以对化合物中包含的子结构进行高度敏感和特异性的检测。用户可以检查底层的dtd集,并且可以通过基于SOAP(简单对象访问协议)的web服务进行批处理。GMD质谱库与集成的dt免费提供非商业用途,网址为http://gmd.mpimp-golm.mpg.de/。所有匹配和结构搜索功能都可以作为基于soap的web服务使用。遵循具象状态传输(Representational State Transfer, REST)原则的XML + HTTP接口促进了对数据库实体的只读访问。
Decision tree supported substructure prediction of metabolites from GC-MS profiles.
Gas chromatography coupled to mass spectrometry (GC-MS) is one of the most widespread routine technologies applied to the large scale screening and discovery of novel metabolic biomarkers. However, currently the majority of mass spectral tags (MSTs) remains unidentified due to the lack of authenticated pure reference substances required for compound identification by GC-MS. Here, we accessed the information on reference compounds stored in the Golm Metabolome Database (GMD) to apply supervised machine learning approaches to the classification and identification of unidentified MSTs without relying on library searches. Non-annotated MSTs with mass spectral and retention index (RI) information together with data of already identified metabolites and reference substances have been archived in the GMD. Structural feature extraction was applied to sub-divide the metabolite space contained in the GMD and to define the prediction target classes. Decision tree (DT)-based prediction of the most frequent substructures based on mass spectral features and RI information is demonstrated to result in highly sensitive and specific detections of sub-structures contained in the compounds. The underlying set of DTs can be inspected by the user and are made available for batch processing via SOAP (Simple Object Access Protocol)-based web services. The GMD mass spectral library with the integrated DTs is freely accessible for non-commercial use at http://gmd.mpimp-golm.mpg.de/. All matching and structure search functionalities are available as SOAP-based web services. A XML + HTTP interface, which follows Representational State Transfer (REST) principles, facilitates read-only access to data base entities.