N6-methyladenine identification using deep learning and discriminative feature integration.

IF 2 4区医学 Q3 GENETICS & HEREDITY

BMC Medical Genomics Pub Date : 2025-03-29 DOI:10.1186/s12920-025-02131-6

Salman Khan, Islam Uddin, Sumaiya Noor, Salman A AlQahtani, Nijad Ahmad

{"title":"N6-methyladenine identification using deep learning and discriminative feature integration.","authors":"Salman Khan, Islam Uddin, Sumaiya Noor, Salman A AlQahtani, Nijad Ahmad","doi":"10.1186/s12920-025-02131-6","DOIUrl":null,"url":null,"abstract":"<p><p>N6-methyladenine (6 mA) is a pivotal DNA modification that plays a crucial role in epigenetic regulation, gene expression, and various biological processes. With advancements in sequencing technologies and computational biology, there is an increasing focus on developing accurate methods for 6 mA site identification to enhance early detection and understand its biological significance. Despite the rapid progress of machine learning in bioinformatics, accurately detecting 6 mA sites remains a challenge due to the limited generalizability and efficiency of existing approaches. In this study, we present Deep-N6mA, a novel Deep Neural Network (DNN) model incorporating optimal hybrid features for precise 6 mA site identification. The proposed framework captures complex patterns from DNA sequences through a comprehensive feature extraction process, leveraging k-mer, Dinucleotide-based Cross Covariance (DCC), Trinucleotide-based Auto Covariance (TAC), Pseudo Single Nucleotide Composition (PseSNC), Pseudo Dinucleotide Composition (PseDNC), and Pseudo Trinucleotide Composition (PseTNC). To optimize computational efficiency and eliminate irrelevant or noisy features, an unsupervised Principal Component Analysis (PCA) algorithm is employed, ensuring the selection of the most informative features. A multilayer DNN serves as the classification algorithm to identify N6-methyladenine sites accurately. The robustness and generalizability of Deep-N6mA were rigorously validated using fivefold cross-validation on two benchmark datasets. Experimental results reveal that Deep-N6mA achieves an average accuracy of 97.70% on the F. vesca dataset and 95.75% on the R. chinensis dataset, outperforming existing methods by 4.12% and 4.55%, respectively. These findings underscore the effectiveness of Deep-N6mA as a reliable tool for early 6 mA site detection, contributing to epigenetic research and advancing the field of computational biology.</p>","PeriodicalId":8915,"journal":{"name":"BMC Medical Genomics","volume":"18 1","pages":"58"},"PeriodicalIF":2.0000,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11955129/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Genomics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12920-025-02131-6","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

Abstract

N6-methyladenine (6 mA) is a pivotal DNA modification that plays a crucial role in epigenetic regulation, gene expression, and various biological processes. With advancements in sequencing technologies and computational biology, there is an increasing focus on developing accurate methods for 6 mA site identification to enhance early detection and understand its biological significance. Despite the rapid progress of machine learning in bioinformatics, accurately detecting 6 mA sites remains a challenge due to the limited generalizability and efficiency of existing approaches. In this study, we present Deep-N6mA, a novel Deep Neural Network (DNN) model incorporating optimal hybrid features for precise 6 mA site identification. The proposed framework captures complex patterns from DNA sequences through a comprehensive feature extraction process, leveraging k-mer, Dinucleotide-based Cross Covariance (DCC), Trinucleotide-based Auto Covariance (TAC), Pseudo Single Nucleotide Composition (PseSNC), Pseudo Dinucleotide Composition (PseDNC), and Pseudo Trinucleotide Composition (PseTNC). To optimize computational efficiency and eliminate irrelevant or noisy features, an unsupervised Principal Component Analysis (PCA) algorithm is employed, ensuring the selection of the most informative features. A multilayer DNN serves as the classification algorithm to identify N6-methyladenine sites accurately. The robustness and generalizability of Deep-N6mA were rigorously validated using fivefold cross-validation on two benchmark datasets. Experimental results reveal that Deep-N6mA achieves an average accuracy of 97.70% on the F. vesca dataset and 95.75% on the R. chinensis dataset, outperforming existing methods by 4.12% and 4.55%, respectively. These findings underscore the effectiveness of Deep-N6mA as a reliable tool for early 6 mA site detection, contributing to epigenetic research and advancing the field of computational biology.

Abstract Image

查看原文本刊更多论文

基于深度学习和判别特征集成的n6 -甲基腺嘌呤识别。

n6 -甲基腺嘌呤（n6 - methylladenine, 6ma）是一种关键的DNA修饰，在表观遗传调控、基因表达和各种生物过程中起着至关重要的作用。随着测序技术和计算生物学的进步，人们越来越关注开发准确的6ma位点鉴定方法，以提高早期发现和了解其生物学意义。尽管机器学习在生物信息学领域进展迅速，但由于现有方法的通用性和效率有限，准确检测6ma位点仍然是一个挑战。在这项研究中，我们提出了Deep- n6ma，这是一种新型的深度神经网络（DNN）模型，结合了精确识别6ma位点的最佳混合特征。该框架利用k-mer、基于二核苷酸的交叉协方差（DCC）、基于三核苷酸的自协方差（TAC）、伪单核苷酸组成（PseSNC）、伪二核苷酸组成（PseDNC）和伪三核苷酸组成（PseTNC），通过综合特征提取过程捕获DNA序列的复杂模式。为了优化计算效率并消除不相关或有噪声的特征，采用无监督主成分分析（PCA）算法，确保选择最具信息量的特征。多层深度神经网络作为分类算法，可以准确识别n6 -甲基腺嘌呤位点。在两个基准数据集上使用五倍交叉验证严格验证了Deep-N6mA的鲁棒性和泛化性。实验结果表明，Deep-N6mA在F. vesca和R. chinensis数据集上的平均准确率分别达到97.70%和95.75%，分别比现有方法高4.12%和4.55%。这些发现强调了Deep-N6mA作为早期6ma位点检测的可靠工具的有效性，有助于表观遗传学研究和推进计算生物学领域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Medical Genomics 医学-遗传学

CiteScore

3.90

自引率

0.00%

发文量

243

审稿时长

3.5 months

期刊介绍： BMC Medical Genomics is an open access journal publishing original peer-reviewed research articles in all aspects of functional genomics, genome structure, genome-scale population genetics, epigenomics, proteomics, systems analysis, and pharmacogenomics in relation to human health and disease.