GCompip: a pipeline for estimating the gene abundance in microbial communities.

IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Bioinformatics advances Pub Date : 2025-08-29 eCollection Date: 2025-01-01 DOI:10.1093/bioadv/vbaf207
Xiang Zhou, Qiushuang Li, Shizhe Zhang, Wenxing Wang, Rong Wang, Xiumin Zhang, Zhiliang Tan, Min Wang
{"title":"GCompip: a pipeline for estimating the gene abundance in microbial communities.","authors":"Xiang Zhou, Qiushuang Li, Shizhe Zhang, Wenxing Wang, Rong Wang, Xiumin Zhang, Zhiliang Tan, Min Wang","doi":"10.1093/bioadv/vbaf207","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Gene abundance in metagenome datasets is commonly represented in terms of Counts or Copies Per Million. However, above term lack the consideration of the size of the microbial communities. To reflect the gene abundance in the microbial communities (GAM), GCompip, a comprehensive pipeline for estimating GAM, was developed based on specialized universal single copy genes (USCG) database, stringent alignment parameters, and rigorous filtering criteria.</p><p><strong>Results: </strong>GCompip showed high specificity without compromising computational efficiency, and improved the precision of downstream GAM estimations across diverse six ecological environments (i.e. human gut, rumen, freshwater, marine, hydrothermal sediment, and glacier). In contrast, the comparative annotation tools (i.e. KofamScan, eggNOG-mapper and HUMAnN3) showed larger error intervals, higher susceptibility to false positives, or overestimation of USCG abundance, primarily due to more relaxed thresholds, multifamily matches, or less stringent alignment settings. To facilitating the applicability of GCompip, we provided both Linux command line and R package versions. Overall, this GCompip presented an accurate, robust, user-friendly, and efficient computational pipeline designed to calculate GAM using metagenomic sequencing data. The developed pipeline makes it accessible to researchers seeking to evaluate the metabolic capabilities of microbial communities, and improve the capacity of interpreting metagenomic data related to microbial communities.</p><p><strong>Availability and implementation: </strong>GCompip package source code and documentation are freely available for download at https://github.com/XiangZhouCAS/GCompip. A separate Linux command line version is available at https://github.com/XiangZhouCAS/GCompip_onlinux.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf207"},"PeriodicalIF":2.8000,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12460045/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf207","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: Gene abundance in metagenome datasets is commonly represented in terms of Counts or Copies Per Million. However, above term lack the consideration of the size of the microbial communities. To reflect the gene abundance in the microbial communities (GAM), GCompip, a comprehensive pipeline for estimating GAM, was developed based on specialized universal single copy genes (USCG) database, stringent alignment parameters, and rigorous filtering criteria.

Results: GCompip showed high specificity without compromising computational efficiency, and improved the precision of downstream GAM estimations across diverse six ecological environments (i.e. human gut, rumen, freshwater, marine, hydrothermal sediment, and glacier). In contrast, the comparative annotation tools (i.e. KofamScan, eggNOG-mapper and HUMAnN3) showed larger error intervals, higher susceptibility to false positives, or overestimation of USCG abundance, primarily due to more relaxed thresholds, multifamily matches, or less stringent alignment settings. To facilitating the applicability of GCompip, we provided both Linux command line and R package versions. Overall, this GCompip presented an accurate, robust, user-friendly, and efficient computational pipeline designed to calculate GAM using metagenomic sequencing data. The developed pipeline makes it accessible to researchers seeking to evaluate the metabolic capabilities of microbial communities, and improve the capacity of interpreting metagenomic data related to microbial communities.

Availability and implementation: GCompip package source code and documentation are freely available for download at https://github.com/XiangZhouCAS/GCompip. A separate Linux command line version is available at https://github.com/XiangZhouCAS/GCompip_onlinux.

GCompip:一个估算微生物群落中基因丰度的管道。
动机:宏基因组数据集中的基因丰度通常以计数或每百万拷贝数表示。然而,上述术语缺乏对微生物群落规模的考虑。为了反映微生物群落(GAM)的基因丰度,GCompip是一个基于专用通用单拷贝基因(USCG)数据库、严格的比对参数和严格的过滤标准开发的综合GAM估算管道。结果:GCompip在不影响计算效率的情况下具有高特异性,提高了6种不同生态环境(即人类肠道、瘤胃、淡水、海洋、热液沉积物和冰川)下游GAM估计的精度。相比之下,比较注释工具(即KofamScan, eggNOG-mapper和HUMAnN3)显示出更大的误差区间,对假阳性的敏感性更高,或对USCG丰度的高估,主要是由于更宽松的阈值,多家族匹配或不太严格的比对设置。为了方便GCompip的适用性,我们提供了Linux命令行和R包版本。总体而言,该GCompip提供了一个准确、稳健、用户友好且高效的计算管道,用于使用宏基因组测序数据计算GAM。开发的管道使研究人员能够评估微生物群落的代谢能力,并提高解释与微生物群落相关的宏基因组数据的能力。可用性和实现:GCompip包的源代码和文档可以在https://github.com/XiangZhouCAS/GCompip上免费下载。一个单独的Linux命令行版本可从https://github.com/XiangZhouCAS/GCompip_onlinux获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.60
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信