{"title":"利用深度学习技术破解植物基因组转录起始调控的序列基础及应用","authors":"Pengfei Gao, Lijie Lian, Wanjie Feng, Yuxue Ma, Jieni Lin, Liya Qin, Shanmeng Hao, Haonan Zhao, Xuantong Liu, Jing Yuan, Zongcheng Lin, Xia Li, Yuefeng Guan, Xutong Wang","doi":"10.1186/s13059-025-03782-5","DOIUrl":null,"url":null,"abstract":"Transcription initiation is a key checkpoint in plant gene regulation, yet the DNA features that determine where and the frequency of the genes start transcription remain unclear. We develop GenoRetriever, an interpretable deep learning model trained on base pair resolution STRIPE-seq data from multiple crop genomes, to systematically reveal and quantify the sequence code that governs transcription start sites (TSSs). Using TSS profiles from 16 soybean tissues and six additional crops, GenoRetriever identifies 27 core promoter motifs, including canonical TATA box and initiator elements, that together dictate TSS choice and activity. Model interpretation shows how each motif modulates both initiation frequency and precise start site position; these effects are confirmed by in silico motif edits, saturation mutagenesis, and targeted promoter assays. A new telomere-to-telomere assembly of wild soybean, Glycine soja, reveals that 31.85% of natural promoter variants shift dominant motifs relative to cultivated soybean, explaining domestication-driven changes in transcriptional regulation. Cross-species comparisons further indicate that, although many motif functions are conserved, monocots and dicots display distinct motif frequencies and positional preferences. GenoRetriever provides an interpretable, cross species framework for decoding transcription initiation in plants. By linking specific sequence motifs to quantitative transcriptional outcomes and validating these links experimentally, our study advances fundamental knowledge of promoter architecture and supplies a practical platform for rational engineering of gene expression in crop improvement and functional genomics.\n","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"32 1","pages":""},"PeriodicalIF":10.1000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deciphering the sequence basis and application of transcriptional initiation regulation in plant genomes through deep learning\",\"authors\":\"Pengfei Gao, Lijie Lian, Wanjie Feng, Yuxue Ma, Jieni Lin, Liya Qin, Shanmeng Hao, Haonan Zhao, Xuantong Liu, Jing Yuan, Zongcheng Lin, Xia Li, Yuefeng Guan, Xutong Wang\",\"doi\":\"10.1186/s13059-025-03782-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Transcription initiation is a key checkpoint in plant gene regulation, yet the DNA features that determine where and the frequency of the genes start transcription remain unclear. We develop GenoRetriever, an interpretable deep learning model trained on base pair resolution STRIPE-seq data from multiple crop genomes, to systematically reveal and quantify the sequence code that governs transcription start sites (TSSs). Using TSS profiles from 16 soybean tissues and six additional crops, GenoRetriever identifies 27 core promoter motifs, including canonical TATA box and initiator elements, that together dictate TSS choice and activity. Model interpretation shows how each motif modulates both initiation frequency and precise start site position; these effects are confirmed by in silico motif edits, saturation mutagenesis, and targeted promoter assays. A new telomere-to-telomere assembly of wild soybean, Glycine soja, reveals that 31.85% of natural promoter variants shift dominant motifs relative to cultivated soybean, explaining domestication-driven changes in transcriptional regulation. Cross-species comparisons further indicate that, although many motif functions are conserved, monocots and dicots display distinct motif frequencies and positional preferences. GenoRetriever provides an interpretable, cross species framework for decoding transcription initiation in plants. By linking specific sequence motifs to quantitative transcriptional outcomes and validating these links experimentally, our study advances fundamental knowledge of promoter architecture and supplies a practical platform for rational engineering of gene expression in crop improvement and functional genomics.\\n\",\"PeriodicalId\":12611,\"journal\":{\"name\":\"Genome Biology\",\"volume\":\"32 1\",\"pages\":\"\"},\"PeriodicalIF\":10.1000,\"publicationDate\":\"2025-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genome Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s13059-025-03782-5\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOTECHNOLOGY & APPLIED MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13059-025-03782-5","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
Deciphering the sequence basis and application of transcriptional initiation regulation in plant genomes through deep learning
Transcription initiation is a key checkpoint in plant gene regulation, yet the DNA features that determine where and the frequency of the genes start transcription remain unclear. We develop GenoRetriever, an interpretable deep learning model trained on base pair resolution STRIPE-seq data from multiple crop genomes, to systematically reveal and quantify the sequence code that governs transcription start sites (TSSs). Using TSS profiles from 16 soybean tissues and six additional crops, GenoRetriever identifies 27 core promoter motifs, including canonical TATA box and initiator elements, that together dictate TSS choice and activity. Model interpretation shows how each motif modulates both initiation frequency and precise start site position; these effects are confirmed by in silico motif edits, saturation mutagenesis, and targeted promoter assays. A new telomere-to-telomere assembly of wild soybean, Glycine soja, reveals that 31.85% of natural promoter variants shift dominant motifs relative to cultivated soybean, explaining domestication-driven changes in transcriptional regulation. Cross-species comparisons further indicate that, although many motif functions are conserved, monocots and dicots display distinct motif frequencies and positional preferences. GenoRetriever provides an interpretable, cross species framework for decoding transcription initiation in plants. By linking specific sequence motifs to quantitative transcriptional outcomes and validating these links experimentally, our study advances fundamental knowledge of promoter architecture and supplies a practical platform for rational engineering of gene expression in crop improvement and functional genomics.
Genome BiologyBiochemistry, Genetics and Molecular Biology-Genetics
CiteScore
21.00
自引率
3.30%
发文量
241
审稿时长
2 months
期刊介绍:
Genome Biology stands as a premier platform for exceptional research across all domains of biology and biomedicine, explored through a genomic and post-genomic lens.
With an impressive impact factor of 12.3 (2022),* the journal secures its position as the 3rd-ranked research journal in the Genetics and Heredity category and the 2nd-ranked research journal in the Biotechnology and Applied Microbiology category by Thomson Reuters. Notably, Genome Biology holds the distinction of being the highest-ranked open-access journal in this category.
Our dedicated team of highly trained in-house Editors collaborates closely with our esteemed Editorial Board of international experts, ensuring the journal remains on the forefront of scientific advances and community standards. Regular engagement with researchers at conferences and institute visits underscores our commitment to staying abreast of the latest developments in the field.