{"title":"HD-6mAPred: a hybrid deep learning approach for accurate prediction of N6-methyladenine sites in plant species.","authors":"Huimin Li, Wei Gao, Yi Tang, Xiaotian Guo","doi":"10.7717/peerj.19463","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>N6-methyladenine (6mA) is an important DNA methylation modification that serves a crucial function in various biological activities. Accurate prediction of 6mA sites is essential for elucidating its biological function and underlying mechanism. Although existing methods have achieved great success, there remains a pressing need for improved prediction accuracy and generalization cap ability across diverse species. This study aimed to develop a robust method to address these challenges.</p><p><strong>Methods: </strong>We proposed HD-6mAPred, a hybrid deep learning model that combines bidirectional gated recurrent unit (BiGRU), convolutional neural network (CNN) and attention mechanism, along with various DNA sequence coding schemes. Firstly, DNA sequences were encoded using four different ways: one-hot encoding, electron-ion interaction pseudo-potential (EIIP), enhanced nucleic acid composition (ENAC) and nucleotide chemical properties (NCP). Secondly, a hold-out search strategy was employed to identify the optimal features or feature combinations for both BiGRU and CNN. Finally, the attention mechanism was introduced to weigh the importance of features derived from the BiGRU and CNN.</p><p><strong>Results: </strong>A series of experiments on the <i>Rosaceae</i>, rice and <i>Arabidopsis</i> datasets were conducted to demonstrate the superiority of HD-6mAPred. In <i>Rosaceae</i>, the HD-6mAPred model achieved excellent performance: accuracy (ACC) of 0.996, Matthew correlation coefficient (MCC) of 0.993, sensitivity (SN) and specificity (SP) of 0.995 and 0.998, respectively. In rice, the evaluation metrics are 0.952 (ACC), 0.905 (MCC), 0.955 (SN), and 0.949 (SP). In <i>Arabidopsis</i>, the corresponding metrics are 0.937 (ACC), 0.875 (MCC), 0.927 (SN), and 0.948 (SP). Compared to existing methods, these results demonstrate that HD-6mAPred achieves state-of-the-art performance in predicting 6mA sites across three plant species. Furthermore, HD-6mAPred not only improves the accuracy of 6mA site prediction, but also shows excellent generalization capability across species. The source code utilized in this study is publicly accessible at https://doi.org/10.5281/zenodo.15355131.</p>","PeriodicalId":19799,"journal":{"name":"PeerJ","volume":"13 ","pages":"e19463"},"PeriodicalIF":2.3000,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12085883/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.7717/peerj.19463","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: N6-methyladenine (6mA) is an important DNA methylation modification that serves a crucial function in various biological activities. Accurate prediction of 6mA sites is essential for elucidating its biological function and underlying mechanism. Although existing methods have achieved great success, there remains a pressing need for improved prediction accuracy and generalization cap ability across diverse species. This study aimed to develop a robust method to address these challenges.
Methods: We proposed HD-6mAPred, a hybrid deep learning model that combines bidirectional gated recurrent unit (BiGRU), convolutional neural network (CNN) and attention mechanism, along with various DNA sequence coding schemes. Firstly, DNA sequences were encoded using four different ways: one-hot encoding, electron-ion interaction pseudo-potential (EIIP), enhanced nucleic acid composition (ENAC) and nucleotide chemical properties (NCP). Secondly, a hold-out search strategy was employed to identify the optimal features or feature combinations for both BiGRU and CNN. Finally, the attention mechanism was introduced to weigh the importance of features derived from the BiGRU and CNN.
Results: A series of experiments on the Rosaceae, rice and Arabidopsis datasets were conducted to demonstrate the superiority of HD-6mAPred. In Rosaceae, the HD-6mAPred model achieved excellent performance: accuracy (ACC) of 0.996, Matthew correlation coefficient (MCC) of 0.993, sensitivity (SN) and specificity (SP) of 0.995 and 0.998, respectively. In rice, the evaluation metrics are 0.952 (ACC), 0.905 (MCC), 0.955 (SN), and 0.949 (SP). In Arabidopsis, the corresponding metrics are 0.937 (ACC), 0.875 (MCC), 0.927 (SN), and 0.948 (SP). Compared to existing methods, these results demonstrate that HD-6mAPred achieves state-of-the-art performance in predicting 6mA sites across three plant species. Furthermore, HD-6mAPred not only improves the accuracy of 6mA site prediction, but also shows excellent generalization capability across species. The source code utilized in this study is publicly accessible at https://doi.org/10.5281/zenodo.15355131.
期刊介绍:
PeerJ is an open access peer-reviewed scientific journal covering research in the biological and medical sciences. At PeerJ, authors take out a lifetime publication plan (for as little as $99) which allows them to publish articles in the journal for free, forever. PeerJ has 5 Nobel Prize Winners on the Board; they have won several industry and media awards; and they are widely recognized as being one of the most interesting recent developments in academic publishing.