Developing a New Phylogeny-Driven Random Forest Model for Functional Metagenomics

IF 3.7 4区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Jyotsna Talreja Wassan;Haiying Wang;Huiru Zheng
{"title":"Developing a New Phylogeny-Driven Random Forest Model for Functional Metagenomics","authors":"Jyotsna Talreja Wassan;Haiying Wang;Huiru Zheng","doi":"10.1109/TNB.2023.3283462","DOIUrl":null,"url":null,"abstract":"Metagenomics is an unobtrusive science linking microbial genes to biological functions or environmental states. Classifying microbial genes into their functional repertoire is an important task in the downstream analysis of Metagenomic studies. The task involves Machine Learning (ML) based supervised methods to achieve good classification performance. Random Forest (RF) has been applied rigorously to microbial gene abundance profiles, mapping them to functional phenotypes. The current research targets tuning RF by the evolutionary ancestry of microbial phylogeny, developing a Phylogeny-RF model for functional classification of metagenomes. This method facilitates capturing the effects of phylogenetic relatedness in an ML classifier itself rather than just applying a supervised classifier over the raw abundances of microbial genes. The idea is rooted in the fact that closely related microbes by phylogeny are highly correlated and tend to have similar genetic and phenotypic traits. Such microbes behave similarly; and hence tend to be selected together, or one of these could be dropped from the analysis, to improve the ML process. The proposed Phylogeny-RF algorithm has been compared with state-of-the-art classification methods including RF and the phylogeny-aware methods of MetaPhyl and PhILR, using three real-world 16S rRNA metagenomic datasets. It has been observed that the proposed method not only achieved significantly better performance than the traditional RF model but also performed better than the other phylogeny-driven benchmarks (p < 0.05). For example, Phylogeny-RF attained a highest AUC of 0.949 and Kappa of 0.891 over soil microbiomes in comparison to other benchmarks.","PeriodicalId":13264,"journal":{"name":"IEEE Transactions on NanoBioscience","volume":null,"pages":null},"PeriodicalIF":3.7000,"publicationDate":"2023-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on NanoBioscience","FirstCategoryId":"99","ListUrlMain":"https://ieeexplore.ieee.org/document/10144805/","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Metagenomics is an unobtrusive science linking microbial genes to biological functions or environmental states. Classifying microbial genes into their functional repertoire is an important task in the downstream analysis of Metagenomic studies. The task involves Machine Learning (ML) based supervised methods to achieve good classification performance. Random Forest (RF) has been applied rigorously to microbial gene abundance profiles, mapping them to functional phenotypes. The current research targets tuning RF by the evolutionary ancestry of microbial phylogeny, developing a Phylogeny-RF model for functional classification of metagenomes. This method facilitates capturing the effects of phylogenetic relatedness in an ML classifier itself rather than just applying a supervised classifier over the raw abundances of microbial genes. The idea is rooted in the fact that closely related microbes by phylogeny are highly correlated and tend to have similar genetic and phenotypic traits. Such microbes behave similarly; and hence tend to be selected together, or one of these could be dropped from the analysis, to improve the ML process. The proposed Phylogeny-RF algorithm has been compared with state-of-the-art classification methods including RF and the phylogeny-aware methods of MetaPhyl and PhILR, using three real-world 16S rRNA metagenomic datasets. It has been observed that the proposed method not only achieved significantly better performance than the traditional RF model but also performed better than the other phylogeny-driven benchmarks (p < 0.05). For example, Phylogeny-RF attained a highest AUC of 0.949 and Kappa of 0.891 over soil microbiomes in comparison to other benchmarks.
为功能宏基因组学开发一个新的系统发育驱动的随机森林模型。
宏基因组学是一门将微生物基因与生物功能或环境状态联系起来的不引人注目的科学。将微生物基因分类为其功能库是宏基因组研究下游分析的一项重要任务。该任务涉及基于机器学习(ML)的监督方法,以实现良好的分类性能。随机森林(RF)已被严格应用于微生物基因丰度谱,将其映射到功能表型。目前的研究目标是通过微生物系统发育的进化祖先来调整RF,开发用于宏基因组功能分类的系统发育RF模型。这种方法有助于在ML分类器本身中捕捉系统发育相关性的影响,而不仅仅是在微生物基因的原始丰度上应用监督分类器。这种想法植根于这样一个事实,即通过系统发育密切相关的微生物具有高度相关性,并且往往具有相似的遗传和表型特征。这些微生物的行为相似;因此倾向于一起选择,或者可以从分析中删除其中一个,以改进ML过程。使用三个真实世界的16S rRNA宏基因组数据集,将所提出的系统发育RF算法与最先进的分类方法进行了比较,包括RF以及MetaPhyl和PhILR的系统发育感知方法。已经观察到,所提出的方法不仅比传统的RF模型取得了显著更好的性能,而且比其他系统发育驱动的基准也取得了更好的性能(p<0.05)。例如,与其他基准相比,系统发育RF在土壤微生物组上获得了0.949的最高AUC和0.891的Kappa。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on NanoBioscience
IEEE Transactions on NanoBioscience 工程技术-纳米科技
CiteScore
7.00
自引率
5.10%
发文量
197
审稿时长
>12 weeks
期刊介绍: The IEEE Transactions on NanoBioscience reports on original, innovative and interdisciplinary work on all aspects of molecular systems, cellular systems, and tissues (including molecular electronics). Topics covered in the journal focus on a broad spectrum of aspects, both on foundations and on applications. Specifically, methods and techniques, experimental aspects, design and implementation, instrumentation and laboratory equipment, clinical aspects, hardware and software data acquisition and analysis and computer based modelling are covered (based on traditional or high performance computing - parallel computers or computer networks).
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信