HMMER-Extractor: an auxiliary toolkit for identifying genomic macromolecular metabolites based on Hidden Markov Models.

IF 7.7 1区化学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

International Journal of Biological Macromolecules Pub Date : 2024-12-01 Epub Date: 2024-11-17 DOI:10.1016/j.ijbiomac.2024.137666

Jing Yang, Siqi Sun, Ning Sun, Li Lu, Chengwu Zhang, Wanyu Shi, Yunhe Zhao, Shulei Jia

{"title":"HMMER-Extractor: an auxiliary toolkit for identifying genomic macromolecular metabolites based on Hidden Markov Models.","authors":"Jing Yang, Siqi Sun, Ning Sun, Li Lu, Chengwu Zhang, Wanyu Shi, Yunhe Zhao, Shulei Jia","doi":"10.1016/j.ijbiomac.2024.137666","DOIUrl":null,"url":null,"abstract":"<p><p>Human microbiome contains various microbial macromolecules with important biological functions. The Hidden Markov Models (HMMs) can overcome the problem of low similarity sequences with distant relationships and are widely implemented within various sequence alignment softwares. However, the HMM-based sequence alignments can generate a large number of results, how to quickly screen and batch extract target homologs from microbiomes is the major sticking points. It is necessary to develop an integrated gene filter and extraction pipeline to quickly and accurately screen homologs. Here, we introduced the HMMER-Extractor for amino acids or nucleotide sequences extraction, which was a supporting toolkit through provided filtering scores and an iterative keyword matching (IKM) logic. To make it more user-friendly and accessible, we further presented a visualized web server platform. An interactive HTML output provided a user-friendly way to browse homologous annotations and sequence extraction. The web server provided the community with a streamlined and user-friendly interface to analyze microbiomes. Through the HMMER-Extractor, we constructed a cardiovascular disease related gene dataset of the macromolecular metabolite trimethylamine (TMA) and lipopolysaccharide (LPS) based on 46,699 bacterial genomes from human gut. Approximately 21,014 and 1961 bacterial strains were identified to contain the cnt or cut operon of TMA, and the waa gene cluster of LPS, respectively. The Escherichia coli occupied the largest proportion among all the bacterial species, which belonged to the phyla Firmicutes. The HMMER-Extractor toolkit is an integrated pipeline and has been proven to be accurate and fast in extracting target macromolecular encoding genes from microbial genomes.</p>","PeriodicalId":333,"journal":{"name":"International Journal of Biological Macromolecules","volume":" ","pages":"137666"},"PeriodicalIF":7.7000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Biological Macromolecules","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1016/j.ijbiomac.2024.137666","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/17 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Human microbiome contains various microbial macromolecules with important biological functions. The Hidden Markov Models (HMMs) can overcome the problem of low similarity sequences with distant relationships and are widely implemented within various sequence alignment softwares. However, the HMM-based sequence alignments can generate a large number of results, how to quickly screen and batch extract target homologs from microbiomes is the major sticking points. It is necessary to develop an integrated gene filter and extraction pipeline to quickly and accurately screen homologs. Here, we introduced the HMMER-Extractor for amino acids or nucleotide sequences extraction, which was a supporting toolkit through provided filtering scores and an iterative keyword matching (IKM) logic. To make it more user-friendly and accessible, we further presented a visualized web server platform. An interactive HTML output provided a user-friendly way to browse homologous annotations and sequence extraction. The web server provided the community with a streamlined and user-friendly interface to analyze microbiomes. Through the HMMER-Extractor, we constructed a cardiovascular disease related gene dataset of the macromolecular metabolite trimethylamine (TMA) and lipopolysaccharide (LPS) based on 46,699 bacterial genomes from human gut. Approximately 21,014 and 1961 bacterial strains were identified to contain the cnt or cut operon of TMA, and the waa gene cluster of LPS, respectively. The Escherichia coli occupied the largest proportion among all the bacterial species, which belonged to the phyla Firmicutes. The HMMER-Extractor toolkit is an integrated pipeline and has been proven to be accurate and fast in extracting target macromolecular encoding genes from microbial genomes.

查看原文本刊更多论文

HMMER-extractor：基于隐马尔可夫模型的基因组大分子代谢物识别辅助工具包。

人类微生物组包含多种具有重要生物功能的微生物大分子。隐马尔可夫模型（HMM）可以克服低相似度序列与远距离关系的问题，并被广泛应用于各种序列比对软件中。然而，基于 HMM 的序列比对会产生大量结果，如何从微生物组中快速筛选并批量提取目标同源物是一大难题。因此有必要开发一种集成的基因筛选和提取管道，以快速准确地筛选同源物。在此，我们介绍了用于氨基酸或核苷酸序列提取的 HMMER-提取器，它是一个通过提供过滤分数和迭代关键词匹配（IKM）逻辑的辅助工具包。为了使其更加方便易用，我们进一步推出了可视化网络服务器平台。交互式 HTML 输出为浏览同源注释和序列提取提供了一种用户友好的方式。网络服务器为社区提供了一个简化的用户友好界面来分析微生物组。通过 HMMER-提取器，我们基于 46,699 个来自人类肠道的细菌基因组，构建了大分子代谢物三甲胺（TMA）和脂多糖（LPS）的心血管疾病相关基因数据集。结果发现，分别有约 21 014 株和 1961 株细菌含有 TMA 的 cnt 或 cut 操作子和 LPS 的 waa 基因簇。在所有细菌中，大肠埃希氏菌所占比例最大，属于真菌门。HMMER-Extractor 工具包是一个集成管道，在从微生物基因组中提取目标大分子编码基因方面被证明是准确和快速的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Biological Macromolecules 生物-生化与分子生物学

CiteScore

13.70

自引率

9.80%

发文量

2728

审稿时长

64 days

期刊介绍： The International Journal of Biological Macromolecules is a well-established international journal dedicated to research on the chemical and biological aspects of natural macromolecules. Focusing on proteins, macromolecular carbohydrates, glycoproteins, proteoglycans, lignins, biological poly-acids, and nucleic acids, the journal presents the latest findings in molecular structure, properties, biological activities, interactions, modifications, and functional properties. Papers must offer new and novel insights, encompassing related model systems, structural conformational studies, theoretical developments, and analytical techniques. Each paper is required to primarily focus on at least one named biological macromolecule, reflected in the title, abstract, and text.