Wei Fang, Xingzhi Chang, Xiaoquan Su, Jian Xu, Deli Zhang, K. Ning
{"title":"A machine learning framework of functional biomarker discovery for different microbial communities based on metagenomic data","authors":"Wei Fang, Xingzhi Chang, Xiaoquan Su, Jian Xu, Deli Zhang, K. Ning","doi":"10.1109/ISB.2012.6314121","DOIUrl":null,"url":null,"abstract":"As more than 90% of microbial community could not be isolated and cultivated, the metagenomic methods have been commonly used to analyze the microbial community as a whole. With the fast acumination of metagenomic samples, it is now intriguing to find simple biomarkers, especially functional biomarkers, which could distinguish different metagenomic samples. Next-generation sequencing techniques have enabled the detection of very accurate gene-presence (abundance) values in metagenomic studies. And the presence/absence or different abundance values for a set of genes could be used as appropriate biomarker for identification of the corresponding microbial community's phenotype. However, it is not yet clear how to select such a set of genes (features), and how accurate would it be for such a set of selected genes on prediction of microbial community's phenotype. In this study, we have evaluated different machine learning methods, including feature selection methods and classification methods, for selection of biomarkers that could distinguish different samples. Then we proposed a machine learning framework, which could discover biomarkers for different microbial communities from the mining of metagenomic data. Given a set of features (genes) and their presence values in multiple samples, we first selected discriminative features as candidate by feature selection, and then selected the feature sets with low error rate and classification accuracies as biomarkers by classification method. We have selected whole genome sequencing data from simulation, public domain and in-house metagenomic data generation facilities. We tested the framework on prediction and evaluation of the biomarkers. Results have shown that the framework could select functional biomarkers with very high accuracy. Therefore, this framework would be a suitable tool to discover functional biomarkers to distinguish different microbial communities.","PeriodicalId":224011,"journal":{"name":"2012 IEEE 6th International Conference on Systems Biology (ISB)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 6th International Conference on Systems Biology (ISB)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISB.2012.6314121","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
As more than 90% of microbial community could not be isolated and cultivated, the metagenomic methods have been commonly used to analyze the microbial community as a whole. With the fast acumination of metagenomic samples, it is now intriguing to find simple biomarkers, especially functional biomarkers, which could distinguish different metagenomic samples. Next-generation sequencing techniques have enabled the detection of very accurate gene-presence (abundance) values in metagenomic studies. And the presence/absence or different abundance values for a set of genes could be used as appropriate biomarker for identification of the corresponding microbial community's phenotype. However, it is not yet clear how to select such a set of genes (features), and how accurate would it be for such a set of selected genes on prediction of microbial community's phenotype. In this study, we have evaluated different machine learning methods, including feature selection methods and classification methods, for selection of biomarkers that could distinguish different samples. Then we proposed a machine learning framework, which could discover biomarkers for different microbial communities from the mining of metagenomic data. Given a set of features (genes) and their presence values in multiple samples, we first selected discriminative features as candidate by feature selection, and then selected the feature sets with low error rate and classification accuracies as biomarkers by classification method. We have selected whole genome sequencing data from simulation, public domain and in-house metagenomic data generation facilities. We tested the framework on prediction and evaluation of the biomarkers. Results have shown that the framework could select functional biomarkers with very high accuracy. Therefore, this framework would be a suitable tool to discover functional biomarkers to distinguish different microbial communities.