HD-6mAPred: a hybrid deep learning approach for accurate prediction of N6-methyladenine sites in plant species.

IF 2.3 3区 生物学 Q2 MULTIDISCIPLINARY SCIENCES
PeerJ Pub Date : 2025-05-15 eCollection Date: 2025-01-01 DOI:10.7717/peerj.19463
Huimin Li, Wei Gao, Yi Tang, Xiaotian Guo
{"title":"HD-6mAPred: a hybrid deep learning approach for accurate prediction of N6-methyladenine sites in plant species.","authors":"Huimin Li, Wei Gao, Yi Tang, Xiaotian Guo","doi":"10.7717/peerj.19463","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>N6-methyladenine (6mA) is an important DNA methylation modification that serves a crucial function in various biological activities. Accurate prediction of 6mA sites is essential for elucidating its biological function and underlying mechanism. Although existing methods have achieved great success, there remains a pressing need for improved prediction accuracy and generalization cap ability across diverse species. This study aimed to develop a robust method to address these challenges.</p><p><strong>Methods: </strong>We proposed HD-6mAPred, a hybrid deep learning model that combines bidirectional gated recurrent unit (BiGRU), convolutional neural network (CNN) and attention mechanism, along with various DNA sequence coding schemes. Firstly, DNA sequences were encoded using four different ways: one-hot encoding, electron-ion interaction pseudo-potential (EIIP), enhanced nucleic acid composition (ENAC) and nucleotide chemical properties (NCP). Secondly, a hold-out search strategy was employed to identify the optimal features or feature combinations for both BiGRU and CNN. Finally, the attention mechanism was introduced to weigh the importance of features derived from the BiGRU and CNN.</p><p><strong>Results: </strong>A series of experiments on the <i>Rosaceae</i>, rice and <i>Arabidopsis</i> datasets were conducted to demonstrate the superiority of HD-6mAPred. In <i>Rosaceae</i>, the HD-6mAPred model achieved excellent performance: accuracy (ACC) of 0.996, Matthew correlation coefficient (MCC) of 0.993, sensitivity (SN) and specificity (SP) of 0.995 and 0.998, respectively. In rice, the evaluation metrics are 0.952 (ACC), 0.905 (MCC), 0.955 (SN), and 0.949 (SP). In <i>Arabidopsis</i>, the corresponding metrics are 0.937 (ACC), 0.875 (MCC), 0.927 (SN), and 0.948 (SP). Compared to existing methods, these results demonstrate that HD-6mAPred achieves state-of-the-art performance in predicting 6mA sites across three plant species. Furthermore, HD-6mAPred not only improves the accuracy of 6mA site prediction, but also shows excellent generalization capability across species. The source code utilized in this study is publicly accessible at https://doi.org/10.5281/zenodo.15355131.</p>","PeriodicalId":19799,"journal":{"name":"PeerJ","volume":"13 ","pages":"e19463"},"PeriodicalIF":2.3000,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12085883/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.7717/peerj.19463","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: N6-methyladenine (6mA) is an important DNA methylation modification that serves a crucial function in various biological activities. Accurate prediction of 6mA sites is essential for elucidating its biological function and underlying mechanism. Although existing methods have achieved great success, there remains a pressing need for improved prediction accuracy and generalization cap ability across diverse species. This study aimed to develop a robust method to address these challenges.

Methods: We proposed HD-6mAPred, a hybrid deep learning model that combines bidirectional gated recurrent unit (BiGRU), convolutional neural network (CNN) and attention mechanism, along with various DNA sequence coding schemes. Firstly, DNA sequences were encoded using four different ways: one-hot encoding, electron-ion interaction pseudo-potential (EIIP), enhanced nucleic acid composition (ENAC) and nucleotide chemical properties (NCP). Secondly, a hold-out search strategy was employed to identify the optimal features or feature combinations for both BiGRU and CNN. Finally, the attention mechanism was introduced to weigh the importance of features derived from the BiGRU and CNN.

Results: A series of experiments on the Rosaceae, rice and Arabidopsis datasets were conducted to demonstrate the superiority of HD-6mAPred. In Rosaceae, the HD-6mAPred model achieved excellent performance: accuracy (ACC) of 0.996, Matthew correlation coefficient (MCC) of 0.993, sensitivity (SN) and specificity (SP) of 0.995 and 0.998, respectively. In rice, the evaluation metrics are 0.952 (ACC), 0.905 (MCC), 0.955 (SN), and 0.949 (SP). In Arabidopsis, the corresponding metrics are 0.937 (ACC), 0.875 (MCC), 0.927 (SN), and 0.948 (SP). Compared to existing methods, these results demonstrate that HD-6mAPred achieves state-of-the-art performance in predicting 6mA sites across three plant species. Furthermore, HD-6mAPred not only improves the accuracy of 6mA site prediction, but also shows excellent generalization capability across species. The source code utilized in this study is publicly accessible at https://doi.org/10.5281/zenodo.15355131.

HD-6mAPred:一种用于准确预测植物物种中n6 -甲基腺嘌呤位点的混合深度学习方法。
背景:n6 -甲基腺嘌呤(n6 - methylladenine, 6mA)是一种重要的DNA甲基化修饰,在多种生物活性中起着至关重要的作用。准确预测6mA位点对阐明其生物学功能和机制至关重要。虽然现有的方法已经取得了很大的成功,但仍然迫切需要提高不同物种的预测精度和泛化能力。本研究旨在开发一种强有力的方法来应对这些挑战。方法:我们提出了一种混合深度学习模型HD-6mAPred,该模型结合了双向门控循环单元(BiGRU)、卷积神经网络(CNN)和注意机制,以及各种DNA序列编码方案。首先,采用单热编码、电子-离子相互作用伪势(EIIP)、增强核酸组成(ENAC)和核苷酸化学性质(NCP)四种不同的编码方式对DNA序列进行编码。其次,采用hold-out搜索策略来识别BiGRU和CNN的最优特征或特征组合。最后,引入了注意机制来权衡BiGRU和CNN衍生的特征的重要性。结果:在蔷薇科、水稻和拟南芥数据集上进行了一系列实验,证明了HD-6mAPred的优越性。在蔷薇科中,HD-6mAPred模型的准确率(ACC)为0.996,马修相关系数(MCC)为0.993,灵敏度(SN)和特异性(SP)分别为0.995和0.998。水稻的评价指标分别为0.952 (ACC)、0.905 (MCC)、0.955 (SN)和0.949 (SP)。拟南芥对应的指标分别为0.937 (ACC)、0.875 (MCC)、0.927 (SN)和0.948 (SP)。与现有方法相比,这些结果表明HD-6mAPred在预测三种植物物种的6mA位点方面达到了最先进的性能。此外,HD-6mAPred不仅提高了6mA位点预测的精度,而且在物种间表现出出色的泛化能力。本研究中使用的源代码可在https://doi.org/10.5281/zenodo.15355131公开访问。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
PeerJ
PeerJ MULTIDISCIPLINARY SCIENCES-
CiteScore
4.70
自引率
3.70%
发文量
1665
审稿时长
10 weeks
期刊介绍: PeerJ is an open access peer-reviewed scientific journal covering research in the biological and medical sciences. At PeerJ, authors take out a lifetime publication plan (for as little as $99) which allows them to publish articles in the journal for free, forever. PeerJ has 5 Nobel Prize Winners on the Board; they have won several industry and media awards; and they are widely recognized as being one of the most interesting recent developments in academic publishing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信