DNA Sequence Perplexity Reveals Evolutionarily Conserved Patterns in cis-Regulatory Regions Across Diverse Species.

IF 1.6 4区 生物学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY
Aruna Sesha Chandrika Gummadi, Venkata Rajesh Yella
{"title":"DNA Sequence Perplexity Reveals Evolutionarily Conserved Patterns in cis-Regulatory Regions Across Diverse Species.","authors":"Aruna Sesha Chandrika Gummadi, Venkata Rajesh Yella","doi":"10.1007/s10528-025-11231-y","DOIUrl":null,"url":null,"abstract":"<p><p>Deciphering cis-regulatory regions in genomes is essential for understanding various physiological processes and pathological mechanisms. Regulatory signatures, namely promoter motifs, transcription factor binding sites, enhancers, GC content, CpG islands, DNA structural motifs, and other cis-regulatory features, are well-established for their roles in transcriptional regulation. However, these features often exhibit species-specific variations, challenging the identification of conserved regulatory principles across different genomes. In this study, we introduce DNA sequence perplexity as an innovative and efficient information-theoretic metric for characterizing cis-regulatory regions. Derived from information theory and natural language processing, perplexity quantifies the complexity and predictability of sequence, offering a motif-independent framework for DNA analysis. By examining transcription and translation start site regions across 1180 species spanning diverse taxa, we demonstrate that cis-regulatory regions consistently exhibit lower perplexity compared to adjacent flanking regions. This trend persists irrespective of taxonomic classification, establishing perplexity as an evolutionarily conserved pattern of regulatory DNA. Additionally, we observe an inverse correlation between perplexity and promoter strength in yeast datasets, suggesting that higher transcriptional outputs are associated with markedly reduced sequence perplexity. Our findings reveal that perplexity may hold valuable insights into the generalizable aspects of cis-regulatory DNA architecture. Integrating this abstraction-based strategy with motif-based approaches and high-throughput functional datasets could enhance its applicability in predictive applications across comparative and functional genomics.</p>","PeriodicalId":482,"journal":{"name":"Biochemical Genetics","volume":" ","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biochemical Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s10528-025-11231-y","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Deciphering cis-regulatory regions in genomes is essential for understanding various physiological processes and pathological mechanisms. Regulatory signatures, namely promoter motifs, transcription factor binding sites, enhancers, GC content, CpG islands, DNA structural motifs, and other cis-regulatory features, are well-established for their roles in transcriptional regulation. However, these features often exhibit species-specific variations, challenging the identification of conserved regulatory principles across different genomes. In this study, we introduce DNA sequence perplexity as an innovative and efficient information-theoretic metric for characterizing cis-regulatory regions. Derived from information theory and natural language processing, perplexity quantifies the complexity and predictability of sequence, offering a motif-independent framework for DNA analysis. By examining transcription and translation start site regions across 1180 species spanning diverse taxa, we demonstrate that cis-regulatory regions consistently exhibit lower perplexity compared to adjacent flanking regions. This trend persists irrespective of taxonomic classification, establishing perplexity as an evolutionarily conserved pattern of regulatory DNA. Additionally, we observe an inverse correlation between perplexity and promoter strength in yeast datasets, suggesting that higher transcriptional outputs are associated with markedly reduced sequence perplexity. Our findings reveal that perplexity may hold valuable insights into the generalizable aspects of cis-regulatory DNA architecture. Integrating this abstraction-based strategy with motif-based approaches and high-throughput functional datasets could enhance its applicability in predictive applications across comparative and functional genomics.

DNA序列困惑揭示了不同物种顺式调控区域的进化保守模式。
破译基因组中的顺式调控区域对于理解各种生理过程和病理机制至关重要。调控特征,即启动子基序、转录因子结合位点、增强子、GC含量、CpG岛、DNA结构基序和其他顺式调控特征,因其在转录调控中的作用而得到完善。然而,这些特征往往表现出物种特异性的变化,挑战了在不同基因组中确定保守的调控原则。在这项研究中,我们将DNA序列困惑作为一种创新和有效的信息论度量来表征顺式调控区域。源于信息理论和自然语言处理,perplexity量化了序列的复杂性和可预测性,为DNA分析提供了一个独立于基序的框架。通过研究跨越不同分类群的1180个物种的转录和翻译起始位点区域,我们证明顺式调控区域与相邻的侧翼区域相比始终表现出更低的困惑。无论分类学分类如何,这种趋势都持续存在,将困惑作为一种进化上保守的调节DNA模式。此外,我们在酵母数据集中观察到困惑度和启动子强度之间的负相关,这表明更高的转录输出与显著降低的序列困惑度相关。我们的研究结果表明,困惑可能对顺式调控DNA结构的可推广方面有价值的见解。将这种基于抽象的策略与基于基序的方法和高通量功能数据集相结合,可以增强其在比较基因组学和功能基因组学预测应用中的适用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Biochemical Genetics
Biochemical Genetics 生物-生化与分子生物学
CiteScore
3.90
自引率
0.00%
发文量
133
审稿时长
4.8 months
期刊介绍: Biochemical Genetics welcomes original manuscripts that address and test clear scientific hypotheses, are directed to a broad scientific audience, and clearly contribute to the advancement of the field through the use of sound sampling or experimental design, reliable analytical methodologies and robust statistical analyses. Although studies focusing on particular regions and target organisms are welcome, it is not the journal’s goal to publish essentially descriptive studies that provide results with narrow applicability, or are based on very small samples or pseudoreplication. Rather, Biochemical Genetics welcomes review articles that go beyond summarizing previous publications and create added value through the systematic analysis and critique of the current state of knowledge or by conducting meta-analyses. Methodological articles are also within the scope of Biological Genetics, particularly when new laboratory techniques or computational approaches are fully described and thoroughly compared with the existing benchmark methods. Biochemical Genetics welcomes articles on the following topics: Genomics; Proteomics; Population genetics; Phylogenetics; Metagenomics; Microbial genetics; Genetics and evolution of wild and cultivated plants; Animal genetics and evolution; Human genetics and evolution; Genetic disorders; Genetic markers of diseases; Gene technology and therapy; Experimental and analytical methods; Statistical and computational methods.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信