Circular code identified by the codon usage

IF 4.6 Q2 MATERIALS SCIENCE, BIOMATERIALS
Christian J. Michel
{"title":"Circular code identified by the codon usage","authors":"Christian J. Michel","doi":"10.1016/j.biosystems.2024.105308","DOIUrl":null,"url":null,"abstract":"<div><p>Since 1996, circular codes in genes have been identified thanks to the development of 6 statistical approaches: trinucleotide frequencies per frame (Arquès and Michel, 1996), correlation functions per frame (Arquès and Michel, 1997), frame permuted trinucleotide frequencies (Frey and Michel, 2003, 2006), advanced statistical functions at the gene population level (Michel, 2015) and at the gene level (Michel, 2017). All these 3-frame statistical methods analyse the trinucleotide information in the 3 frames of genes: the reading frame and the 2 shifted frames. Notably, codon usage does not allow for the identification of circular codes (Michel, 2020). This has been a long-standing problem since 1996, hindering biologists’ access to circular code theory.</p><p>By considering circular code conditions resulting from code theory, particularly the concept of permutation class, and building upon previous statistical work, a new statistical approach based solely on the codon usage, i.e. a 1-frame statistical method, surprisingly reveals the maximal <span><math><msup><mrow><mi>C</mi></mrow><mrow><mn>3</mn></mrow></msup></math></span> self-complementary trinucleotide circular code <span><math><mi>X</mi></math></span> in bacterial genes and in average (bacterial, archaeal, eukaryotic) genes, and almost in archaeal genes. Additionally, a new parameter definition indicates that bacterial and archaeal genes exhibit codon usage dispersion of the same order of magnitude, but significantly higher than that observed in eukaryotic genes. This statistical finding may explain the greater variability of codes in eukaryotic genes compared to bacterial and archaeal genes, an issue that has been open for many years. Finally, biologists can now search for new (variant) circular codes at both the genome level (across all genes in a given genome) and the gene level using only codon usage, without the need for analysing the shifted frames.</p></div>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S030326472400193X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 0

Abstract

Since 1996, circular codes in genes have been identified thanks to the development of 6 statistical approaches: trinucleotide frequencies per frame (Arquès and Michel, 1996), correlation functions per frame (Arquès and Michel, 1997), frame permuted trinucleotide frequencies (Frey and Michel, 2003, 2006), advanced statistical functions at the gene population level (Michel, 2015) and at the gene level (Michel, 2017). All these 3-frame statistical methods analyse the trinucleotide information in the 3 frames of genes: the reading frame and the 2 shifted frames. Notably, codon usage does not allow for the identification of circular codes (Michel, 2020). This has been a long-standing problem since 1996, hindering biologists’ access to circular code theory.

By considering circular code conditions resulting from code theory, particularly the concept of permutation class, and building upon previous statistical work, a new statistical approach based solely on the codon usage, i.e. a 1-frame statistical method, surprisingly reveals the maximal C3 self-complementary trinucleotide circular code X in bacterial genes and in average (bacterial, archaeal, eukaryotic) genes, and almost in archaeal genes. Additionally, a new parameter definition indicates that bacterial and archaeal genes exhibit codon usage dispersion of the same order of magnitude, but significantly higher than that observed in eukaryotic genes. This statistical finding may explain the greater variability of codes in eukaryotic genes compared to bacterial and archaeal genes, an issue that has been open for many years. Finally, biologists can now search for new (variant) circular codes at both the genome level (across all genes in a given genome) and the gene level using only codon usage, without the need for analysing the shifted frames.

通过密码子的使用来识别循环代码。
自1996年以来,由于以下6种统计方法的发展,基因中的循环密码得以确定:每帧三核苷酸频率(Arquès和Michel,1996年)、每帧相关函数(Arquès和Michel,1997年)、帧包被三核苷酸频率(Frey和Michel,2003年,2006年)、基因群体水平(Michel,2015年)和基因水平(Michel,2017年)的高级统计函数。所有这些三帧统计方法都分析基因三帧中的三核苷酸信息:阅读帧和两个移码帧。值得注意的是,密码子用法无法识别循环密码(Michel,2020)。这是自 1996 年以来一直存在的问题,阻碍了生物学家对循环密码理论的研究。通过考虑编码理论所产生的循环编码条件,特别是排列类的概念,并在以往统计工作的基础上,一种仅基于密码子使用情况的新统计方法(即 1 帧统计方法)令人惊讶地揭示了细菌基因和平均(细菌、古生物、真核生物)基因中最大的 C3 自互补三核苷酸循环编码 X,而且几乎揭示了古生物基因中的最大 C3 自互补三核苷酸循环编码 X。此外,一个新的参数定义表明,细菌基因和古细菌基因的密码子使用分散程度相同,但明显高于真核基因。这一统计发现可能解释了真核生物基因中密码的变异性大于细菌和古细菌基因的原因,而这一问题多年来一直悬而未决。最后,生物学家现在可以在基因组水平(特定基因组中的所有基因)和基因水平上仅使用密码子使用情况来搜索新的(变异)循环密码,而无需分析移码框。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACS Applied Bio Materials
ACS Applied Bio Materials Chemistry-Chemistry (all)
CiteScore
9.40
自引率
2.10%
发文量
464
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信