利用深度学习技术破解植物基因组转录起始调控的序列基础及应用

IF 10.1 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY
Pengfei Gao, Lijie Lian, Wanjie Feng, Yuxue Ma, Jieni Lin, Liya Qin, Shanmeng Hao, Haonan Zhao, Xuantong Liu, Jing Yuan, Zongcheng Lin, Xia Li, Yuefeng Guan, Xutong Wang
{"title":"利用深度学习技术破解植物基因组转录起始调控的序列基础及应用","authors":"Pengfei Gao, Lijie Lian, Wanjie Feng, Yuxue Ma, Jieni Lin, Liya Qin, Shanmeng Hao, Haonan Zhao, Xuantong Liu, Jing Yuan, Zongcheng Lin, Xia Li, Yuefeng Guan, Xutong Wang","doi":"10.1186/s13059-025-03782-5","DOIUrl":null,"url":null,"abstract":"Transcription initiation is a key checkpoint in plant gene regulation, yet the DNA features that determine where and the frequency of the genes start transcription remain unclear. We develop GenoRetriever, an interpretable deep learning model trained on base pair resolution STRIPE-seq data from multiple crop genomes, to systematically reveal and quantify the sequence code that governs transcription start sites (TSSs). Using TSS profiles from 16 soybean tissues and six additional crops, GenoRetriever identifies 27 core promoter motifs, including canonical TATA box and initiator elements, that together dictate TSS choice and activity. Model interpretation shows how each motif modulates both initiation frequency and precise start site position; these effects are confirmed by in silico motif edits, saturation mutagenesis, and targeted promoter assays. A new telomere-to-telomere assembly of wild soybean, Glycine soja, reveals that 31.85% of natural promoter variants shift dominant motifs relative to cultivated soybean, explaining domestication-driven changes in transcriptional regulation. Cross-species comparisons further indicate that, although many motif functions are conserved, monocots and dicots display distinct motif frequencies and positional preferences. GenoRetriever provides an interpretable, cross species framework for decoding transcription initiation in plants. By linking specific sequence motifs to quantitative transcriptional outcomes and validating these links experimentally, our study advances fundamental knowledge of promoter architecture and supplies a practical platform for rational engineering of gene expression in crop improvement and functional genomics.\n","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"32 1","pages":""},"PeriodicalIF":10.1000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deciphering the sequence basis and application of transcriptional initiation regulation in plant genomes through deep learning\",\"authors\":\"Pengfei Gao, Lijie Lian, Wanjie Feng, Yuxue Ma, Jieni Lin, Liya Qin, Shanmeng Hao, Haonan Zhao, Xuantong Liu, Jing Yuan, Zongcheng Lin, Xia Li, Yuefeng Guan, Xutong Wang\",\"doi\":\"10.1186/s13059-025-03782-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Transcription initiation is a key checkpoint in plant gene regulation, yet the DNA features that determine where and the frequency of the genes start transcription remain unclear. We develop GenoRetriever, an interpretable deep learning model trained on base pair resolution STRIPE-seq data from multiple crop genomes, to systematically reveal and quantify the sequence code that governs transcription start sites (TSSs). Using TSS profiles from 16 soybean tissues and six additional crops, GenoRetriever identifies 27 core promoter motifs, including canonical TATA box and initiator elements, that together dictate TSS choice and activity. Model interpretation shows how each motif modulates both initiation frequency and precise start site position; these effects are confirmed by in silico motif edits, saturation mutagenesis, and targeted promoter assays. A new telomere-to-telomere assembly of wild soybean, Glycine soja, reveals that 31.85% of natural promoter variants shift dominant motifs relative to cultivated soybean, explaining domestication-driven changes in transcriptional regulation. Cross-species comparisons further indicate that, although many motif functions are conserved, monocots and dicots display distinct motif frequencies and positional preferences. GenoRetriever provides an interpretable, cross species framework for decoding transcription initiation in plants. By linking specific sequence motifs to quantitative transcriptional outcomes and validating these links experimentally, our study advances fundamental knowledge of promoter architecture and supplies a practical platform for rational engineering of gene expression in crop improvement and functional genomics.\\n\",\"PeriodicalId\":12611,\"journal\":{\"name\":\"Genome Biology\",\"volume\":\"32 1\",\"pages\":\"\"},\"PeriodicalIF\":10.1000,\"publicationDate\":\"2025-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genome Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s13059-025-03782-5\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOTECHNOLOGY & APPLIED MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13059-025-03782-5","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

转录起始是植物基因调控的关键检查点,但决定基因开始转录的位置和频率的DNA特征尚不清楚。我们开发了GenoRetriever,这是一个可解释的深度学习模型,训练了来自多个作物基因组的碱基对分辨率STRIPE-seq数据,以系统地揭示和量化控制转录起始位点(tss)的序列代码。GenoRetriever利用来自16种大豆组织和另外6种作物的TSS谱,鉴定出27个核心启动子基元,包括典型的TATA box和启动子元件,它们共同决定了TSS的选择和活性。模型解释显示了每个基序如何调节起始频率和精确的起始位点位置;这些影响是通过硅基序编辑,饱和诱变和靶向启动子分析证实的。一项新的野生大豆端粒-端粒组装发现,与栽培大豆相比,31.85%的天然启动子变异改变了显性基序,解释了驯化驱动的转录调控变化。跨物种比较进一步表明,虽然许多基序功能是保守的,但单子花和双子花显示出不同的基序频率和位置偏好。GenoRetriever为植物转录起始解码提供了一个可解释的跨物种框架。通过将特定序列基序与定量转录结果联系起来,并通过实验验证这些联系,我们的研究推进了启动子结构的基础知识,并为作物改良和功能基因组学中的基因表达合理工程提供了一个实用的平台。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Deciphering the sequence basis and application of transcriptional initiation regulation in plant genomes through deep learning
Transcription initiation is a key checkpoint in plant gene regulation, yet the DNA features that determine where and the frequency of the genes start transcription remain unclear. We develop GenoRetriever, an interpretable deep learning model trained on base pair resolution STRIPE-seq data from multiple crop genomes, to systematically reveal and quantify the sequence code that governs transcription start sites (TSSs). Using TSS profiles from 16 soybean tissues and six additional crops, GenoRetriever identifies 27 core promoter motifs, including canonical TATA box and initiator elements, that together dictate TSS choice and activity. Model interpretation shows how each motif modulates both initiation frequency and precise start site position; these effects are confirmed by in silico motif edits, saturation mutagenesis, and targeted promoter assays. A new telomere-to-telomere assembly of wild soybean, Glycine soja, reveals that 31.85% of natural promoter variants shift dominant motifs relative to cultivated soybean, explaining domestication-driven changes in transcriptional regulation. Cross-species comparisons further indicate that, although many motif functions are conserved, monocots and dicots display distinct motif frequencies and positional preferences. GenoRetriever provides an interpretable, cross species framework for decoding transcription initiation in plants. By linking specific sequence motifs to quantitative transcriptional outcomes and validating these links experimentally, our study advances fundamental knowledge of promoter architecture and supplies a practical platform for rational engineering of gene expression in crop improvement and functional genomics.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Genome Biology
Genome Biology Biochemistry, Genetics and Molecular Biology-Genetics
CiteScore
21.00
自引率
3.30%
发文量
241
审稿时长
2 months
期刊介绍: Genome Biology stands as a premier platform for exceptional research across all domains of biology and biomedicine, explored through a genomic and post-genomic lens. With an impressive impact factor of 12.3 (2022),* the journal secures its position as the 3rd-ranked research journal in the Genetics and Heredity category and the 2nd-ranked research journal in the Biotechnology and Applied Microbiology category by Thomson Reuters. Notably, Genome Biology holds the distinction of being the highest-ranked open-access journal in this category. Our dedicated team of highly trained in-house Editors collaborates closely with our esteemed Editorial Board of international experts, ensuring the journal remains on the forefront of scientific advances and community standards. Regular engagement with researchers at conferences and institute visits underscores our commitment to staying abreast of the latest developments in the field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信