B. Wingfield, S. Coleman, T. McGinnity, A. Bjourson
{"title":"A metagenomic hybrid classifier for paediatric inflammatory bowel disease","authors":"B. Wingfield, S. Coleman, T. McGinnity, A. Bjourson","doi":"10.1109/IJCNN.2016.7727318","DOIUrl":null,"url":null,"abstract":"Inflammatory bowel disease (IBD) is a group of inflammatory diseases of the human colon and small intestine. IBD symptoms are non-specific; diagnosis can be delayed because an invasive colonoscopy is required for confirmation. Delayed diagnosis is linked to poor growth in children. Imbalances in the human intestinal microbiome - the community of microorganisms that reside in the human gut - are thought to contribute to the development of IBD. Work done to date in classifying host health statuses from patterns in human microbiomes with supervised learning algorithms has focused on modelling what is present in the gut (i.e. a bacterial census) with the random forest algorithm. Metagenomic shotgun sequencing is required to understand what is occurring in the gut (i.e. gene functions) and is often cost prohibitive for hundreds of samples. However, gene functions can be predicted with the Phylogenetic Investigation of Communities by Reconstruction of Unobserved States (PiCRUSt) software package, which could represent a valuable source of new features. In this paper we investigate feature relevance across the feature set with the Boruta algorithm. We find that the majority of relevant features are from the predicted metagenome. Support vector machines (SVM) and multilayer perceptrons (MLP) are rarely used with microbiomic datasets but offer several theoretical advantages. To determine if the new features and alternative algorithms are appropriate, we experiment with a range of machine learning and computational intelligence algorithms. With the best performing algorithms we also implement a conditional multiple classifier system that can identify IBD presence, IBD subtype, and IBD activity from a non-invasive stool sample.","PeriodicalId":109405,"journal":{"name":"2016 International Joint Conference on Neural Networks (IJCNN)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.2016.7727318","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
Inflammatory bowel disease (IBD) is a group of inflammatory diseases of the human colon and small intestine. IBD symptoms are non-specific; diagnosis can be delayed because an invasive colonoscopy is required for confirmation. Delayed diagnosis is linked to poor growth in children. Imbalances in the human intestinal microbiome - the community of microorganisms that reside in the human gut - are thought to contribute to the development of IBD. Work done to date in classifying host health statuses from patterns in human microbiomes with supervised learning algorithms has focused on modelling what is present in the gut (i.e. a bacterial census) with the random forest algorithm. Metagenomic shotgun sequencing is required to understand what is occurring in the gut (i.e. gene functions) and is often cost prohibitive for hundreds of samples. However, gene functions can be predicted with the Phylogenetic Investigation of Communities by Reconstruction of Unobserved States (PiCRUSt) software package, which could represent a valuable source of new features. In this paper we investigate feature relevance across the feature set with the Boruta algorithm. We find that the majority of relevant features are from the predicted metagenome. Support vector machines (SVM) and multilayer perceptrons (MLP) are rarely used with microbiomic datasets but offer several theoretical advantages. To determine if the new features and alternative algorithms are appropriate, we experiment with a range of machine learning and computational intelligence algorithms. With the best performing algorithms we also implement a conditional multiple classifier system that can identify IBD presence, IBD subtype, and IBD activity from a non-invasive stool sample.