A novel method for predicting DNA N4-methylcytosine sites based on deep forest algorithm.

IF 0.9 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Yonglin Zhang, Mei Hu, Qi Mo, Wenli Gan, Jiesi Luo
{"title":"A novel method for predicting DNA N<sup>4</sup>-methylcytosine sites based on deep forest algorithm.","authors":"Yonglin Zhang,&nbsp;Mei Hu,&nbsp;Qi Mo,&nbsp;Wenli Gan,&nbsp;Jiesi Luo","doi":"10.1142/S0219720023500038","DOIUrl":null,"url":null,"abstract":"<p><p>N<sup>4</sup>-methyladenosine (4mC) methylation is an essential epigenetic modification of deoxyribonucleic acid (DNA) that plays a key role in many biological processes such as gene expression, gene replication and transcriptional regulation. Genome-wide identification and analysis of the 4mC sites can better reveal the epigenetic mechanisms that regulate various biological processes. Although some high-throughput genomic experimental methods can effectively facilitate the identification in a genome-wide scale, they are still too expensive and laborious for routine use. Computational methods can compensate for these disadvantages, but they still leave much room for performance improvement. In this study, we develop a non-NN-style deep learning-based approach for accurately predicting 4mC sites from genomic DNA sequence. We generate various informative features represented sequence fragments around 4mC sites, and subsequently implement them into a deep forest (DF) model. After training the deep model using 10-fold cross-validation, the overall accuracies of 85.0%, 90.0%, and 87.8% were achieved for three representative model organisms, <i>A. thaliana, C. elegans</i>, and <i>D. melanogaster</i>, respectively. In addition, extensive experiment results show that our proposed approach outperforms other existing state-of-the-art predictors in the 4mC identification. Our approach stands for the first DF-based algorithm for the prediction of 4mC sites, providing a novel idea in this field.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":0.9000,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Bioinformatics and Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1142/S0219720023500038","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

N4-methyladenosine (4mC) methylation is an essential epigenetic modification of deoxyribonucleic acid (DNA) that plays a key role in many biological processes such as gene expression, gene replication and transcriptional regulation. Genome-wide identification and analysis of the 4mC sites can better reveal the epigenetic mechanisms that regulate various biological processes. Although some high-throughput genomic experimental methods can effectively facilitate the identification in a genome-wide scale, they are still too expensive and laborious for routine use. Computational methods can compensate for these disadvantages, but they still leave much room for performance improvement. In this study, we develop a non-NN-style deep learning-based approach for accurately predicting 4mC sites from genomic DNA sequence. We generate various informative features represented sequence fragments around 4mC sites, and subsequently implement them into a deep forest (DF) model. After training the deep model using 10-fold cross-validation, the overall accuracies of 85.0%, 90.0%, and 87.8% were achieved for three representative model organisms, A. thaliana, C. elegans, and D. melanogaster, respectively. In addition, extensive experiment results show that our proposed approach outperforms other existing state-of-the-art predictors in the 4mC identification. Our approach stands for the first DF-based algorithm for the prediction of 4mC sites, providing a novel idea in this field.

基于深度森林算法的DNA n4 -甲基胞嘧啶位点预测新方法。
n4 -甲基腺苷(4mC)甲基化是脱氧核糖核酸(DNA)必不可少的表观遗传修饰,在基因表达、基因复制和转录调控等许多生物学过程中起着关键作用。4mC位点的全基因组鉴定和分析可以更好地揭示调控各种生物过程的表观遗传机制。虽然一些高通量的基因组实验方法可以有效地促进全基因组范围内的鉴定,但它们仍然过于昂贵和费力,无法常规使用。计算方法可以弥补这些缺点,但它们仍然为性能改进留下了很大的空间。在这项研究中,我们开发了一种非神经网络风格的基于深度学习的方法,用于从基因组DNA序列中准确预测4mC位点。我们在4mC位点附近生成了代表序列片段的各种信息特征,并随后将其实现到深森林(DF)模型中。采用10倍交叉验证对深度模型进行训练后,拟南拟南、秀丽隐杆线虫和黑腹d.m anogaster 3种典型模式生物的总体准确率分别达到85.0%、90.0%和87.8%。此外,大量的实验结果表明,我们提出的方法在4mC识别方面优于其他现有的最先进的预测方法。我们的方法代表了第一个基于df的4mC位点预测算法,为该领域提供了一个新的思路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Bioinformatics and Computational Biology
Journal of Bioinformatics and Computational Biology MATHEMATICAL & COMPUTATIONAL BIOLOGY-
CiteScore
2.10
自引率
0.00%
发文量
57
期刊介绍: The Journal of Bioinformatics and Computational Biology aims to publish high quality, original research articles, expository tutorial papers and review papers as well as short, critical comments on technical issues associated with the analysis of cellular information. The research papers will be technical presentations of new assertions, discoveries and tools, intended for a narrower specialist community. The tutorials, reviews and critical commentary will be targeted at a broader readership of biologists who are interested in using computers but are not knowledgeable about scientific computing, and equally, computer scientists who have an interest in biology but are not familiar with current thrusts nor the language of biology. Such carefully chosen tutorials and articles should greatly accelerate the rate of entry of these new creative scientists into the field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信