通过基于机器学习的分析确定细菌重要基因中不同使用的密码子。

IF 2.3 3区 生物学 Q3 BIOCHEMISTRY & MOLECULAR BIOLOGY
Annushree Kurmi, Piyali Sen, Madhusmita Dash, Suvendra Kumar Ray, Siddhartha Sankar Satapathy
{"title":"通过基于机器学习的分析确定细菌重要基因中不同使用的密码子。","authors":"Annushree Kurmi, Piyali Sen, Madhusmita Dash, Suvendra Kumar Ray, Siddhartha Sankar Satapathy","doi":"10.1007/s00438-024-02163-0","DOIUrl":null,"url":null,"abstract":"<p><p>Codon usage bias (CUB), the uneven usage of synonymous codons encoding the same amino acid, differs among genes within and across bacteria genomes. CUB is known to be influenced by gene expression and accordingly, CUB differs between the high-expression and low-expression genes in several bacteria. In this article, we have extended codon usage study considering gene essentiality as a feature. Using machine learning (ML) based approaches, we have analysed Relative Synonymous Codon Usage (RSCU) values between essential and non-essential genes in Escherichia coli and thirty-four other bacterial genomes whose gene essentiality features were available in public databases. We observed significant differences in codon usage patterns between essential and non-essential genes for majority of the bacterial genomes and accordingly, ML based classifiers achieved high area under curve (AUC) scores, with a minimum score of 70.0 across twenty-eight organisms. Further, importance of the codons towards classifying genes found to differ among the codons in each genome. Arg codon CGT and Gly codon GGT were observed to be the most preferred codons among essential genes in Escherichia coli. Interestingly, some of the codons like CGT, ATA, GGT and GGG observed to be contributing consistently towards classifying essential genes across thirty-five bacteria genomes studied. In other hand, codons TGY and CAY encoding amino acids Cys and His respectively were among the least contributing codons towards classification among all these bacteria. This study demonstrates the gene essentiality based differences in synonymous codon usage in bacteria genomes and presents a common codon usage pattern across bacteria.</p>","PeriodicalId":18816,"journal":{"name":"Molecular Genetics and Genomics","volume":null,"pages":null},"PeriodicalIF":2.3000,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Differentially used codons among essential genes in bacteria identified by machine learning-based analysis.\",\"authors\":\"Annushree Kurmi, Piyali Sen, Madhusmita Dash, Suvendra Kumar Ray, Siddhartha Sankar Satapathy\",\"doi\":\"10.1007/s00438-024-02163-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Codon usage bias (CUB), the uneven usage of synonymous codons encoding the same amino acid, differs among genes within and across bacteria genomes. CUB is known to be influenced by gene expression and accordingly, CUB differs between the high-expression and low-expression genes in several bacteria. In this article, we have extended codon usage study considering gene essentiality as a feature. Using machine learning (ML) based approaches, we have analysed Relative Synonymous Codon Usage (RSCU) values between essential and non-essential genes in Escherichia coli and thirty-four other bacterial genomes whose gene essentiality features were available in public databases. We observed significant differences in codon usage patterns between essential and non-essential genes for majority of the bacterial genomes and accordingly, ML based classifiers achieved high area under curve (AUC) scores, with a minimum score of 70.0 across twenty-eight organisms. Further, importance of the codons towards classifying genes found to differ among the codons in each genome. Arg codon CGT and Gly codon GGT were observed to be the most preferred codons among essential genes in Escherichia coli. Interestingly, some of the codons like CGT, ATA, GGT and GGG observed to be contributing consistently towards classifying essential genes across thirty-five bacteria genomes studied. In other hand, codons TGY and CAY encoding amino acids Cys and His respectively were among the least contributing codons towards classification among all these bacteria. This study demonstrates the gene essentiality based differences in synonymous codon usage in bacteria genomes and presents a common codon usage pattern across bacteria.</p>\",\"PeriodicalId\":18816,\"journal\":{\"name\":\"Molecular Genetics and Genomics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2024-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Genetics and Genomics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1007/s00438-024-02163-0\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Genetics and Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s00438-024-02163-0","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

密码子使用偏差(CUB)是指编码相同氨基酸的同义密码子的不均衡使用,在细菌基因组内部和不同基因之间存在差异。众所周知,CUB 受基因表达的影响,因此在一些细菌中,高表达基因和低表达基因之间的 CUB 存在差异。在本文中,我们扩展了密码子使用研究,将基因本质作为一个特征。利用基于机器学习(ML)的方法,我们分析了大肠杆菌和其他 34 个细菌基因组中基因本质特征可从公共数据库中获得的基因本质和非本质基因之间的相对同义密码子用法(RSCU)值。在大多数细菌基因组中,我们观察到必需基因和非必需基因之间的密码子使用模式存在明显差异,因此,基于 ML 的分类器获得了较高的曲线下面积(AUC)分数,在 28 个生物体中的最低分数为 70.0。此外,每个基因组中不同密码子对基因分类的重要性也不同。在大肠杆菌的重要基因中,Arg 密码子 CGT 和 Gly 密码子 GGT 是最受欢迎的密码子。有趣的是,在所研究的 35 个细菌基因组中,CGT、ATA、GGT 和 GGG 等一些密码子被观察到对重要基因的分类有一致的贡献。另一方面,分别编码氨基酸 Cys 和 His 的密码子 TGY 和 CAY 是所有这些细菌中对基因分类贡献最小的密码子。这项研究证明了细菌基因组中同义密码子使用的差异对基因本质的影响,并提出了细菌中常见的密码子使用模式。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Differentially used codons among essential genes in bacteria identified by machine learning-based analysis.

Differentially used codons among essential genes in bacteria identified by machine learning-based analysis.

Codon usage bias (CUB), the uneven usage of synonymous codons encoding the same amino acid, differs among genes within and across bacteria genomes. CUB is known to be influenced by gene expression and accordingly, CUB differs between the high-expression and low-expression genes in several bacteria. In this article, we have extended codon usage study considering gene essentiality as a feature. Using machine learning (ML) based approaches, we have analysed Relative Synonymous Codon Usage (RSCU) values between essential and non-essential genes in Escherichia coli and thirty-four other bacterial genomes whose gene essentiality features were available in public databases. We observed significant differences in codon usage patterns between essential and non-essential genes for majority of the bacterial genomes and accordingly, ML based classifiers achieved high area under curve (AUC) scores, with a minimum score of 70.0 across twenty-eight organisms. Further, importance of the codons towards classifying genes found to differ among the codons in each genome. Arg codon CGT and Gly codon GGT were observed to be the most preferred codons among essential genes in Escherichia coli. Interestingly, some of the codons like CGT, ATA, GGT and GGG observed to be contributing consistently towards classifying essential genes across thirty-five bacteria genomes studied. In other hand, codons TGY and CAY encoding amino acids Cys and His respectively were among the least contributing codons towards classification among all these bacteria. This study demonstrates the gene essentiality based differences in synonymous codon usage in bacteria genomes and presents a common codon usage pattern across bacteria.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Molecular Genetics and Genomics
Molecular Genetics and Genomics 生物-生化与分子生物学
CiteScore
5.10
自引率
3.20%
发文量
134
审稿时长
1 months
期刊介绍: Molecular Genetics and Genomics (MGG) publishes peer-reviewed articles covering all areas of genetics and genomics. Any approach to the study of genes and genomes is considered, be it experimental, theoretical or synthetic. MGG publishes research on all organisms that is of broad interest to those working in the fields of genetics, genomics, biology, medicine and biotechnology. The journal investigates a broad range of topics, including these from recent issues: mechanisms for extending longevity in a variety of organisms; screening of yeast metal homeostasis genes involved in mitochondrial functions; molecular mapping of cultivar-specific avirulence genes in the rice blast fungus and more.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信