Jing Yang, Siqi Sun, Ning Sun, Li Lu, Chengwu Zhang, Wanyu Shi, Yunhe Zhao, Shulei Jia
{"title":"HMMER-extractor: An auxiliary toolkit for identifying genomic macromolecular metabolites based on hidden Markov models.","authors":"Jing Yang, Siqi Sun, Ning Sun, Li Lu, Chengwu Zhang, Wanyu Shi, Yunhe Zhao, Shulei Jia","doi":"10.1016/j.ijbiomac.2024.137666","DOIUrl":null,"url":null,"abstract":"<p><p>Human microbiome contains various microbial macromolecules with important biological functions. The Hidden Markov Models (HMMs) can overcome the problem of low similarity sequences with distant relationships and are widely implemented within various sequence alignment softwares. However, the HMM-based sequence alignments can generate a large number of results, how to quickly screen and batch extract target homologs from microbiomes is the major sticking points. It is necessary to develop an integrated gene filter and extraction pipeline to quickly and accurately screen homologs. Here, we introduced the HMMER-Extractor for amino acids or nucleotide sequences extraction, which was a supporting toolkit through provided filtering scores and an iterative keyword matching (IKM) logic. To make it more user-friendly and accessible, we further presented a visualized web server platform. An interactive HTML output provided a user-friendly way to browse homologous annotations and sequence extraction. The web server provided the community with a streamlined and user-friendly interface to analyze microbiomes. Through the HMMER-Extractor, we constructed a cardiovascular disease related gene dataset of the macromolecular metabolite trimethylamine (TMA) and lipopolysaccharide (LPS) based on 46,699 bacterial genomes from human gut. Approximately 21,014 and 1961 bacterial strains were identified to contain the cnt or cut operon of TMA, and the waa gene cluster of LPS, respectively. The Escherichia coli occupied the largest proportion among all the bacterial species, which belonged to the phyla Firmicutes. The HMMER-Extractor toolkit is an integrated pipeline and has been proven to be accurate and fast in extracting target macromolecular encoding genes from microbial genomes.</p>","PeriodicalId":333,"journal":{"name":"International Journal of Biological Macromolecules","volume":" ","pages":"137666"},"PeriodicalIF":7.7000,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Biological Macromolecules","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1016/j.ijbiomac.2024.137666","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Human microbiome contains various microbial macromolecules with important biological functions. The Hidden Markov Models (HMMs) can overcome the problem of low similarity sequences with distant relationships and are widely implemented within various sequence alignment softwares. However, the HMM-based sequence alignments can generate a large number of results, how to quickly screen and batch extract target homologs from microbiomes is the major sticking points. It is necessary to develop an integrated gene filter and extraction pipeline to quickly and accurately screen homologs. Here, we introduced the HMMER-Extractor for amino acids or nucleotide sequences extraction, which was a supporting toolkit through provided filtering scores and an iterative keyword matching (IKM) logic. To make it more user-friendly and accessible, we further presented a visualized web server platform. An interactive HTML output provided a user-friendly way to browse homologous annotations and sequence extraction. The web server provided the community with a streamlined and user-friendly interface to analyze microbiomes. Through the HMMER-Extractor, we constructed a cardiovascular disease related gene dataset of the macromolecular metabolite trimethylamine (TMA) and lipopolysaccharide (LPS) based on 46,699 bacterial genomes from human gut. Approximately 21,014 and 1961 bacterial strains were identified to contain the cnt or cut operon of TMA, and the waa gene cluster of LPS, respectively. The Escherichia coli occupied the largest proportion among all the bacterial species, which belonged to the phyla Firmicutes. The HMMER-Extractor toolkit is an integrated pipeline and has been proven to be accurate and fast in extracting target macromolecular encoding genes from microbial genomes.
期刊介绍:
The International Journal of Biological Macromolecules is a well-established international journal dedicated to research on the chemical and biological aspects of natural macromolecules. Focusing on proteins, macromolecular carbohydrates, glycoproteins, proteoglycans, lignins, biological poly-acids, and nucleic acids, the journal presents the latest findings in molecular structure, properties, biological activities, interactions, modifications, and functional properties. Papers must offer new and novel insights, encompassing related model systems, structural conformational studies, theoretical developments, and analytical techniques. Each paper is required to primarily focus on at least one named biological macromolecule, reflected in the title, abstract, and text.