{"title":"DNA序列困惑揭示了不同物种顺式调控区域的进化保守模式。","authors":"Aruna Sesha Chandrika Gummadi, Venkata Rajesh Yella","doi":"10.1007/s10528-025-11231-y","DOIUrl":null,"url":null,"abstract":"<p><p>Deciphering cis-regulatory regions in genomes is essential for understanding various physiological processes and pathological mechanisms. Regulatory signatures, namely promoter motifs, transcription factor binding sites, enhancers, GC content, CpG islands, DNA structural motifs, and other cis-regulatory features, are well-established for their roles in transcriptional regulation. However, these features often exhibit species-specific variations, challenging the identification of conserved regulatory principles across different genomes. In this study, we introduce DNA sequence perplexity as an innovative and efficient information-theoretic metric for characterizing cis-regulatory regions. Derived from information theory and natural language processing, perplexity quantifies the complexity and predictability of sequence, offering a motif-independent framework for DNA analysis. By examining transcription and translation start site regions across 1180 species spanning diverse taxa, we demonstrate that cis-regulatory regions consistently exhibit lower perplexity compared to adjacent flanking regions. This trend persists irrespective of taxonomic classification, establishing perplexity as an evolutionarily conserved pattern of regulatory DNA. Additionally, we observe an inverse correlation between perplexity and promoter strength in yeast datasets, suggesting that higher transcriptional outputs are associated with markedly reduced sequence perplexity. Our findings reveal that perplexity may hold valuable insights into the generalizable aspects of cis-regulatory DNA architecture. Integrating this abstraction-based strategy with motif-based approaches and high-throughput functional datasets could enhance its applicability in predictive applications across comparative and functional genomics.</p>","PeriodicalId":482,"journal":{"name":"Biochemical Genetics","volume":" ","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DNA Sequence Perplexity Reveals Evolutionarily Conserved Patterns in cis-Regulatory Regions Across Diverse Species.\",\"authors\":\"Aruna Sesha Chandrika Gummadi, Venkata Rajesh Yella\",\"doi\":\"10.1007/s10528-025-11231-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Deciphering cis-regulatory regions in genomes is essential for understanding various physiological processes and pathological mechanisms. Regulatory signatures, namely promoter motifs, transcription factor binding sites, enhancers, GC content, CpG islands, DNA structural motifs, and other cis-regulatory features, are well-established for their roles in transcriptional regulation. However, these features often exhibit species-specific variations, challenging the identification of conserved regulatory principles across different genomes. In this study, we introduce DNA sequence perplexity as an innovative and efficient information-theoretic metric for characterizing cis-regulatory regions. Derived from information theory and natural language processing, perplexity quantifies the complexity and predictability of sequence, offering a motif-independent framework for DNA analysis. By examining transcription and translation start site regions across 1180 species spanning diverse taxa, we demonstrate that cis-regulatory regions consistently exhibit lower perplexity compared to adjacent flanking regions. This trend persists irrespective of taxonomic classification, establishing perplexity as an evolutionarily conserved pattern of regulatory DNA. Additionally, we observe an inverse correlation between perplexity and promoter strength in yeast datasets, suggesting that higher transcriptional outputs are associated with markedly reduced sequence perplexity. Our findings reveal that perplexity may hold valuable insights into the generalizable aspects of cis-regulatory DNA architecture. Integrating this abstraction-based strategy with motif-based approaches and high-throughput functional datasets could enhance its applicability in predictive applications across comparative and functional genomics.</p>\",\"PeriodicalId\":482,\"journal\":{\"name\":\"Biochemical Genetics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2025-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biochemical Genetics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1007/s10528-025-11231-y\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biochemical Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s10528-025-11231-y","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
DNA Sequence Perplexity Reveals Evolutionarily Conserved Patterns in cis-Regulatory Regions Across Diverse Species.
Deciphering cis-regulatory regions in genomes is essential for understanding various physiological processes and pathological mechanisms. Regulatory signatures, namely promoter motifs, transcription factor binding sites, enhancers, GC content, CpG islands, DNA structural motifs, and other cis-regulatory features, are well-established for their roles in transcriptional regulation. However, these features often exhibit species-specific variations, challenging the identification of conserved regulatory principles across different genomes. In this study, we introduce DNA sequence perplexity as an innovative and efficient information-theoretic metric for characterizing cis-regulatory regions. Derived from information theory and natural language processing, perplexity quantifies the complexity and predictability of sequence, offering a motif-independent framework for DNA analysis. By examining transcription and translation start site regions across 1180 species spanning diverse taxa, we demonstrate that cis-regulatory regions consistently exhibit lower perplexity compared to adjacent flanking regions. This trend persists irrespective of taxonomic classification, establishing perplexity as an evolutionarily conserved pattern of regulatory DNA. Additionally, we observe an inverse correlation between perplexity and promoter strength in yeast datasets, suggesting that higher transcriptional outputs are associated with markedly reduced sequence perplexity. Our findings reveal that perplexity may hold valuable insights into the generalizable aspects of cis-regulatory DNA architecture. Integrating this abstraction-based strategy with motif-based approaches and high-throughput functional datasets could enhance its applicability in predictive applications across comparative and functional genomics.
期刊介绍:
Biochemical Genetics welcomes original manuscripts that address and test clear scientific hypotheses, are directed to a broad scientific audience, and clearly contribute to the advancement of the field through the use of sound sampling or experimental design, reliable analytical methodologies and robust statistical analyses.
Although studies focusing on particular regions and target organisms are welcome, it is not the journal’s goal to publish essentially descriptive studies that provide results with narrow applicability, or are based on very small samples or pseudoreplication.
Rather, Biochemical Genetics welcomes review articles that go beyond summarizing previous publications and create added value through the systematic analysis and critique of the current state of knowledge or by conducting meta-analyses.
Methodological articles are also within the scope of Biological Genetics, particularly when new laboratory techniques or computational approaches are fully described and thoroughly compared with the existing benchmark methods.
Biochemical Genetics welcomes articles on the following topics: Genomics; Proteomics; Population genetics; Phylogenetics; Metagenomics; Microbial genetics; Genetics and evolution of wild and cultivated plants; Animal genetics and evolution; Human genetics and evolution; Genetic disorders; Genetic markers of diseases; Gene technology and therapy; Experimental and analytical methods; Statistical and computational methods.