Xiang Zhou, Qiushuang Li, Shizhe Zhang, Wenxing Wang, Rong Wang, Xiumin Zhang, Zhiliang Tan, Min Wang
{"title":"GCompip:一个估算微生物群落中基因丰度的管道。","authors":"Xiang Zhou, Qiushuang Li, Shizhe Zhang, Wenxing Wang, Rong Wang, Xiumin Zhang, Zhiliang Tan, Min Wang","doi":"10.1093/bioadv/vbaf207","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Gene abundance in metagenome datasets is commonly represented in terms of Counts or Copies Per Million. However, above term lack the consideration of the size of the microbial communities. To reflect the gene abundance in the microbial communities (GAM), GCompip, a comprehensive pipeline for estimating GAM, was developed based on specialized universal single copy genes (USCG) database, stringent alignment parameters, and rigorous filtering criteria.</p><p><strong>Results: </strong>GCompip showed high specificity without compromising computational efficiency, and improved the precision of downstream GAM estimations across diverse six ecological environments (i.e. human gut, rumen, freshwater, marine, hydrothermal sediment, and glacier). In contrast, the comparative annotation tools (i.e. KofamScan, eggNOG-mapper and HUMAnN3) showed larger error intervals, higher susceptibility to false positives, or overestimation of USCG abundance, primarily due to more relaxed thresholds, multifamily matches, or less stringent alignment settings. To facilitating the applicability of GCompip, we provided both Linux command line and R package versions. Overall, this GCompip presented an accurate, robust, user-friendly, and efficient computational pipeline designed to calculate GAM using metagenomic sequencing data. The developed pipeline makes it accessible to researchers seeking to evaluate the metabolic capabilities of microbial communities, and improve the capacity of interpreting metagenomic data related to microbial communities.</p><p><strong>Availability and implementation: </strong>GCompip package source code and documentation are freely available for download at https://github.com/XiangZhouCAS/GCompip. A separate Linux command line version is available at https://github.com/XiangZhouCAS/GCompip_onlinux.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf207"},"PeriodicalIF":2.8000,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12460045/pdf/","citationCount":"0","resultStr":"{\"title\":\"GCompip: a pipeline for estimating the gene abundance in microbial communities.\",\"authors\":\"Xiang Zhou, Qiushuang Li, Shizhe Zhang, Wenxing Wang, Rong Wang, Xiumin Zhang, Zhiliang Tan, Min Wang\",\"doi\":\"10.1093/bioadv/vbaf207\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>Gene abundance in metagenome datasets is commonly represented in terms of Counts or Copies Per Million. However, above term lack the consideration of the size of the microbial communities. To reflect the gene abundance in the microbial communities (GAM), GCompip, a comprehensive pipeline for estimating GAM, was developed based on specialized universal single copy genes (USCG) database, stringent alignment parameters, and rigorous filtering criteria.</p><p><strong>Results: </strong>GCompip showed high specificity without compromising computational efficiency, and improved the precision of downstream GAM estimations across diverse six ecological environments (i.e. human gut, rumen, freshwater, marine, hydrothermal sediment, and glacier). In contrast, the comparative annotation tools (i.e. KofamScan, eggNOG-mapper and HUMAnN3) showed larger error intervals, higher susceptibility to false positives, or overestimation of USCG abundance, primarily due to more relaxed thresholds, multifamily matches, or less stringent alignment settings. To facilitating the applicability of GCompip, we provided both Linux command line and R package versions. Overall, this GCompip presented an accurate, robust, user-friendly, and efficient computational pipeline designed to calculate GAM using metagenomic sequencing data. The developed pipeline makes it accessible to researchers seeking to evaluate the metabolic capabilities of microbial communities, and improve the capacity of interpreting metagenomic data related to microbial communities.</p><p><strong>Availability and implementation: </strong>GCompip package source code and documentation are freely available for download at https://github.com/XiangZhouCAS/GCompip. A separate Linux command line version is available at https://github.com/XiangZhouCAS/GCompip_onlinux.</p>\",\"PeriodicalId\":72368,\"journal\":{\"name\":\"Bioinformatics advances\",\"volume\":\"5 1\",\"pages\":\"vbaf207\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12460045/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioadv/vbaf207\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf207","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
GCompip: a pipeline for estimating the gene abundance in microbial communities.
Motivation: Gene abundance in metagenome datasets is commonly represented in terms of Counts or Copies Per Million. However, above term lack the consideration of the size of the microbial communities. To reflect the gene abundance in the microbial communities (GAM), GCompip, a comprehensive pipeline for estimating GAM, was developed based on specialized universal single copy genes (USCG) database, stringent alignment parameters, and rigorous filtering criteria.
Results: GCompip showed high specificity without compromising computational efficiency, and improved the precision of downstream GAM estimations across diverse six ecological environments (i.e. human gut, rumen, freshwater, marine, hydrothermal sediment, and glacier). In contrast, the comparative annotation tools (i.e. KofamScan, eggNOG-mapper and HUMAnN3) showed larger error intervals, higher susceptibility to false positives, or overestimation of USCG abundance, primarily due to more relaxed thresholds, multifamily matches, or less stringent alignment settings. To facilitating the applicability of GCompip, we provided both Linux command line and R package versions. Overall, this GCompip presented an accurate, robust, user-friendly, and efficient computational pipeline designed to calculate GAM using metagenomic sequencing data. The developed pipeline makes it accessible to researchers seeking to evaluate the metabolic capabilities of microbial communities, and improve the capacity of interpreting metagenomic data related to microbial communities.
Availability and implementation: GCompip package source code and documentation are freely available for download at https://github.com/XiangZhouCAS/GCompip. A separate Linux command line version is available at https://github.com/XiangZhouCAS/GCompip_onlinux.