{"title":"PTF-Vāc:一个可解释和生成的深度共同学习编码器-解码器系统,用于从头开始发现植物转录因子结合位点。","authors":"Sagar Gupta, Jyoti, Umesh Bhati, Veerbhan Kesarwani, Akanksha Sharma, Ravi Shankar","doi":"10.1016/j.xplc.2025.101543","DOIUrl":null,"url":null,"abstract":"<p><p>Discovery of transcription factors (TFs) binding sites (TFBS) and their motifs in plants pose significant challenges due to high cross-species variability. The interaction between TFs and their binding sites is highly specific and context dependent. Most of the existing TFBS finding tools are not accurate enough to discover these binding sites in plants. They fail to capture the cross-species variability, interdependence between TF structure and its TFBS, and context specificity of binding. Since they are coupled to predefined TF specific model/matrix, they are highly vulnerable towards the volume and quality of data provided to build the motifs. All these software make a presumption that the user input would be specific to any particular TF which renders them of very limited use for practical applications like genomic annotations of newly sequenced species. Here, we report an explainable Deep Encoders-Decoders generative system, PTF-Vāc, founded on a universal model of deep co-learning on variability in binding sites and TF structure, PTFSpot, making it completely free from the bottlenecks mentioned above. It has successfully decoupled the process of TFBS discovery from the prior step of motif finding and requirement of TF specific motif models. Due to the universal model for TF:DNA interactions as its guide, it can discover the binding motifs in total independence from data volume, species and TF specific models. In a comprehensive benchmarking study across a huge volume of experimental data, it has outperformed most advanced motif finding deep learning (DL) algorithms. With this all, PTF-Vāc brings a completely new chapter in ab-initio TFBS discovery through generative AI.</p>","PeriodicalId":52373,"journal":{"name":"Plant Communications","volume":" ","pages":"101543"},"PeriodicalIF":11.6000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PTF-Vāc: An explainable and generative deep co-learning encoders-decoders system for ab-initio discovery of plant transcription factor binding sites.\",\"authors\":\"Sagar Gupta, Jyoti, Umesh Bhati, Veerbhan Kesarwani, Akanksha Sharma, Ravi Shankar\",\"doi\":\"10.1016/j.xplc.2025.101543\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Discovery of transcription factors (TFs) binding sites (TFBS) and their motifs in plants pose significant challenges due to high cross-species variability. The interaction between TFs and their binding sites is highly specific and context dependent. Most of the existing TFBS finding tools are not accurate enough to discover these binding sites in plants. They fail to capture the cross-species variability, interdependence between TF structure and its TFBS, and context specificity of binding. Since they are coupled to predefined TF specific model/matrix, they are highly vulnerable towards the volume and quality of data provided to build the motifs. All these software make a presumption that the user input would be specific to any particular TF which renders them of very limited use for practical applications like genomic annotations of newly sequenced species. Here, we report an explainable Deep Encoders-Decoders generative system, PTF-Vāc, founded on a universal model of deep co-learning on variability in binding sites and TF structure, PTFSpot, making it completely free from the bottlenecks mentioned above. It has successfully decoupled the process of TFBS discovery from the prior step of motif finding and requirement of TF specific motif models. Due to the universal model for TF:DNA interactions as its guide, it can discover the binding motifs in total independence from data volume, species and TF specific models. In a comprehensive benchmarking study across a huge volume of experimental data, it has outperformed most advanced motif finding deep learning (DL) algorithms. With this all, PTF-Vāc brings a completely new chapter in ab-initio TFBS discovery through generative AI.</p>\",\"PeriodicalId\":52373,\"journal\":{\"name\":\"Plant Communications\",\"volume\":\" \",\"pages\":\"101543\"},\"PeriodicalIF\":11.6000,\"publicationDate\":\"2025-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Plant Communications\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1016/j.xplc.2025.101543\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Communications","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.xplc.2025.101543","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
植物中转录因子(TFBS)结合位点(TFBS)及其基序的发现面临着巨大的挑战,因为它们具有高度的跨物种变异性。tf与其结合位点之间的相互作用具有高度特异性和环境依赖性。现有的大多数TFBS寻找工具都不够精确,无法在植物中发现这些结合位点。它们未能捕捉到跨物种的可变性、TF结构与其TFBS之间的相互依赖性以及结合的上下文特异性。由于它们与预定义的TF特定模型/矩阵相耦合,因此它们极易受到用于构建motif的数据量和质量的影响。所有这些软件都假设用户的输入是特定于任何特定的TF的,这使得它们在实际应用中非常有限,比如对新测序物种的基因组注释。在这里,我们报告了一个可解释的深度编码器-解码器生成系统PTF-Vāc,该系统建立在结合位点和TF结构可变性的深度共同学习的通用模型PTFSpot上,使其完全摆脱了上述瓶颈。它成功地将TFBS发现过程与先前的motif发现步骤和TF特定motif模型的要求解耦。由于有TF:DNA相互作用的通用模型作为指导,它可以发现完全独立于数据量、物种和TF特异性模型的结合基序。在一项针对大量实验数据的综合基准研究中,它的表现优于最先进的motif finding deep learning (DL)算法。有了这一切,PTF-Vāc通过生成式人工智能为ab-initio TFBS发现带来了全新的篇章。
PTF-Vāc: An explainable and generative deep co-learning encoders-decoders system for ab-initio discovery of plant transcription factor binding sites.
Discovery of transcription factors (TFs) binding sites (TFBS) and their motifs in plants pose significant challenges due to high cross-species variability. The interaction between TFs and their binding sites is highly specific and context dependent. Most of the existing TFBS finding tools are not accurate enough to discover these binding sites in plants. They fail to capture the cross-species variability, interdependence between TF structure and its TFBS, and context specificity of binding. Since they are coupled to predefined TF specific model/matrix, they are highly vulnerable towards the volume and quality of data provided to build the motifs. All these software make a presumption that the user input would be specific to any particular TF which renders them of very limited use for practical applications like genomic annotations of newly sequenced species. Here, we report an explainable Deep Encoders-Decoders generative system, PTF-Vāc, founded on a universal model of deep co-learning on variability in binding sites and TF structure, PTFSpot, making it completely free from the bottlenecks mentioned above. It has successfully decoupled the process of TFBS discovery from the prior step of motif finding and requirement of TF specific motif models. Due to the universal model for TF:DNA interactions as its guide, it can discover the binding motifs in total independence from data volume, species and TF specific models. In a comprehensive benchmarking study across a huge volume of experimental data, it has outperformed most advanced motif finding deep learning (DL) algorithms. With this all, PTF-Vāc brings a completely new chapter in ab-initio TFBS discovery through generative AI.
期刊介绍:
Plant Communications is an open access publishing platform that supports the global plant science community. It publishes original research, review articles, technical advances, and research resources in various areas of plant sciences. The scope of topics includes evolution, ecology, physiology, biochemistry, development, reproduction, metabolism, molecular and cellular biology, genetics, genomics, environmental interactions, biotechnology, breeding of higher and lower plants, and their interactions with other organisms. The goal of Plant Communications is to provide a high-quality platform for the dissemination of plant science research.