Qilong Lai, Shuai Yao, Yuguo Zha, Haohong Zhang, Haobo Zhang, Ying Ye, Yonghui Zhang, Hong Bai, Kang Ning
{"title":"Deciphering the biosynthetic potential of microbial genomes using a BGC language processing neural network model","authors":"Qilong Lai, Shuai Yao, Yuguo Zha, Haohong Zhang, Haobo Zhang, Ying Ye, Yonghui Zhang, Hong Bai, Kang Ning","doi":"10.1093/nar/gkaf305","DOIUrl":null,"url":null,"abstract":"Biosynthetic gene clusters (BGCs), key in synthesizing microbial secondary metabolites, are mostly hidden in microbial genomes and metagenomes. To unearth this vast potential, we present BGC-Prophet, a transformer-based language model for BGC prediction and classification. Leveraging the transformer encoder, BGC-Prophet captures location-dependent relationships between genes. As one of the pioneering ultrahigh-throughput tools, BGC-Prophet significantly surpasses existing methods in efficiency and fidelity, enabling comprehensive pan-phylogenetic and whole-metagenome BGC screening. Through the analysis of 85 203 genomes and 9428 metagenomes, BGC-Prophet has profiled an extensive array of sub-million BGCs. It highlights notable enrichment in phyla like Actinomycetota and the widespread distribution of polyketide, NRP, and RiPP BGCs across diverse lineages. It reveals enrichment patterns of BGCs following important geological events, suggesting environmental influences on BGC evolution. BGC-Prophet’s capabilities in detection of BGCs and evolutionary patterns offer contributions to deeper understanding of microbial secondary metabolites and application in synthetic biology.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"1 1","pages":""},"PeriodicalIF":16.6000,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nucleic Acids Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/nar/gkaf305","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Biosynthetic gene clusters (BGCs), key in synthesizing microbial secondary metabolites, are mostly hidden in microbial genomes and metagenomes. To unearth this vast potential, we present BGC-Prophet, a transformer-based language model for BGC prediction and classification. Leveraging the transformer encoder, BGC-Prophet captures location-dependent relationships between genes. As one of the pioneering ultrahigh-throughput tools, BGC-Prophet significantly surpasses existing methods in efficiency and fidelity, enabling comprehensive pan-phylogenetic and whole-metagenome BGC screening. Through the analysis of 85 203 genomes and 9428 metagenomes, BGC-Prophet has profiled an extensive array of sub-million BGCs. It highlights notable enrichment in phyla like Actinomycetota and the widespread distribution of polyketide, NRP, and RiPP BGCs across diverse lineages. It reveals enrichment patterns of BGCs following important geological events, suggesting environmental influences on BGC evolution. BGC-Prophet’s capabilities in detection of BGCs and evolutionary patterns offer contributions to deeper understanding of microbial secondary metabolites and application in synthetic biology.
期刊介绍:
Nucleic Acids Research (NAR) is a scientific journal that publishes research on various aspects of nucleic acids and proteins involved in nucleic acid metabolism and interactions. It covers areas such as chemistry and synthetic biology, computational biology, gene regulation, chromatin and epigenetics, genome integrity, repair and replication, genomics, molecular biology, nucleic acid enzymes, RNA, and structural biology. The journal also includes a Survey and Summary section for brief reviews. Additionally, each year, the first issue is dedicated to biological databases, and an issue in July focuses on web-based software resources for the biological community. Nucleic Acids Research is indexed by several services including Abstracts on Hygiene and Communicable Diseases, Animal Breeding Abstracts, Agricultural Engineering Abstracts, Agbiotech News and Information, BIOSIS Previews, CAB Abstracts, and EMBASE.