Circular code identified by the codon usage

IF 2 4区生物学 Q2 BIOLOGY

Biosystems Pub Date : 2024-08-17 DOI:10.1016/j.biosystems.2024.105308

Christian J. Michel

{"title":"Circular code identified by the codon usage","authors":"Christian J. Michel","doi":"10.1016/j.biosystems.2024.105308","DOIUrl":null,"url":null,"abstract":"<div><p>Since 1996, circular codes in genes have been identified thanks to the development of 6 statistical approaches: trinucleotide frequencies per frame (Arquès and Michel, 1996), correlation functions per frame (Arquès and Michel, 1997), frame permuted trinucleotide frequencies (Frey and Michel, 2003, 2006), advanced statistical functions at the gene population level (Michel, 2015) and at the gene level (Michel, 2017). All these 3-frame statistical methods analyse the trinucleotide information in the 3 frames of genes: the reading frame and the 2 shifted frames. Notably, codon usage does not allow for the identification of circular codes (Michel, 2020). This has been a long-standing problem since 1996, hindering biologists’ access to circular code theory.</p><p>By considering circular code conditions resulting from code theory, particularly the concept of permutation class, and building upon previous statistical work, a new statistical approach based solely on the codon usage, i.e. a 1-frame statistical method, surprisingly reveals the maximal <span><math><msup><mrow><mi>C</mi></mrow><mrow><mn>3</mn></mrow></msup></math></span> self-complementary trinucleotide circular code <span><math><mi>X</mi></math></span> in bacterial genes and in average (bacterial, archaeal, eukaryotic) genes, and almost in archaeal genes. Additionally, a new parameter definition indicates that bacterial and archaeal genes exhibit codon usage dispersion of the same order of magnitude, but significantly higher than that observed in eukaryotic genes. This statistical finding may explain the greater variability of codes in eukaryotic genes compared to bacterial and archaeal genes, an issue that has been open for many years. Finally, biologists can now search for new (variant) circular codes at both the genome level (across all genes in a given genome) and the gene level using only codon usage, without the need for analysing the shifted frames.</p></div>","PeriodicalId":50730,"journal":{"name":"Biosystems","volume":"244 ","pages":"Article 105308"},"PeriodicalIF":2.0000,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biosystems","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S030326472400193X","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Since 1996, circular codes in genes have been identified thanks to the development of 6 statistical approaches: trinucleotide frequencies per frame (Arquès and Michel, 1996), correlation functions per frame (Arquès and Michel, 1997), frame permuted trinucleotide frequencies (Frey and Michel, 2003, 2006), advanced statistical functions at the gene population level (Michel, 2015) and at the gene level (Michel, 2017). All these 3-frame statistical methods analyse the trinucleotide information in the 3 frames of genes: the reading frame and the 2 shifted frames. Notably, codon usage does not allow for the identification of circular codes (Michel, 2020). This has been a long-standing problem since 1996, hindering biologists’ access to circular code theory.

By considering circular code conditions resulting from code theory, particularly the concept of permutation class, and building upon previous statistical work, a new statistical approach based solely on the codon usage, i.e. a 1-frame statistical method, surprisingly reveals the maximal $C^{3}$ self-complementary trinucleotide circular code $X$ in bacterial genes and in average (bacterial, archaeal, eukaryotic) genes, and almost in archaeal genes. Additionally, a new parameter definition indicates that bacterial and archaeal genes exhibit codon usage dispersion of the same order of magnitude, but significantly higher than that observed in eukaryotic genes. This statistical finding may explain the greater variability of codes in eukaryotic genes compared to bacterial and archaeal genes, an issue that has been open for many years. Finally, biologists can now search for new (variant) circular codes at both the genome level (across all genes in a given genome) and the gene level using only codon usage, without the need for analysing the shifted frames.

查看原文本刊更多论文

通过密码子的使用来识别循环代码。

自1996年以来，由于以下6种统计方法的发展，基因中的循环密码得以确定：每帧三核苷酸频率（Arquès和Michel，1996年）、每帧相关函数（Arquès和Michel，1997年）、帧包被三核苷酸频率（Frey和Michel，2003年，2006年）、基因群体水平（Michel，2015年）和基因水平（Michel，2017年）的高级统计函数。所有这些三帧统计方法都分析基因三帧中的三核苷酸信息：阅读帧和两个移码帧。值得注意的是，密码子用法无法识别循环密码（Michel，2020）。这是自 1996 年以来一直存在的问题，阻碍了生物学家对循环密码理论的研究。通过考虑编码理论所产生的循环编码条件，特别是排列类的概念，并在以往统计工作的基础上，一种仅基于密码子使用情况的新统计方法（即 1 帧统计方法）令人惊讶地揭示了细菌基因和平均（细菌、古生物、真核生物）基因中最大的 C3 自互补三核苷酸循环编码 X，而且几乎揭示了古生物基因中的最大 C3 自互补三核苷酸循环编码 X。此外，一个新的参数定义表明，细菌基因和古细菌基因的密码子使用分散程度相同，但明显高于真核基因。这一统计发现可能解释了真核生物基因中密码的变异性大于细菌和古细菌基因的原因，而这一问题多年来一直悬而未决。最后，生物学家现在可以在基因组水平（特定基因组中的所有基因）和基因水平上仅使用密码子使用情况来搜索新的（变异）循环密码，而无需分析移码框。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biosystems 生物-生物学

CiteScore

3.70

自引率

18.80%

发文量

129

审稿时长

34 days

期刊介绍： BioSystems encourages experimental, computational, and theoretical articles that link biology, evolutionary thinking, and the information processing sciences. The link areas form a circle that encompasses the fundamental nature of biological information processing, computational modeling of complex biological systems, evolutionary models of computation, the application of biological principles to the design of novel computing systems, and the use of biomolecular materials to synthesize artificial systems that capture essential principles of natural biological information processing.