Sia-m7G: Predicting m7G Sites through the Siamese Neural Network with an Attention Mechanism

IF 2.9 3区生物学 Q3 BIOCHEMICAL RESEARCH METHODS

Current Bioinformatics Pub Date : 2024-02-09 DOI:10.2174/0115748936285540240116065719

Jia Zheng, Yetong Zhou

{"title":"Sia-m7G: Predicting m7G Sites through the Siamese Neural Network with an Attention Mechanism","authors":"Jia Zheng, Yetong Zhou","doi":"10.2174/0115748936285540240116065719","DOIUrl":null,"url":null,"abstract":"Background: The chemical modification of RNA plays a crucial role in many biological processes. N7-methylguanosine (m7G), being one of the most important epigenetic modifications, plays an important role in gene expression, processing metabolism, and protein synthesis. Detecting the exact location of m7G sites in the transcriptome is key to understanding their relevant mechanism in gene expression. On the basis of experimentally validated data, several machine learning or deep learning tools have been designed to identify internal m7G sites and have shown advantages over traditional experimental methods in terms of speed, cost-effectiveness and robustness. Aims: In this study, we aim to develop a computational model to help predict the exact location of m7G sites in humans. Objective: Simple and advanced encoding methods and deep learning networks are designed to achieve excellent m7G prediction efficiently. Methods: Three types of feature extractions and six classification algorithms were tested to identify m7G sites. Our final model, named Sia-m7G, adopts one-hot encoding and a delicate Siamese neural network with an attention mechanism. In addition, multiple 10-fold cross-validation tests were conducted to evaluate our predictor. Results: Sia-m7G achieved the highest sensitivity, specificity and accuracy on 10-fold crossvalidation tests compared with the other six m7G predictors. Nucleotide preference and model visualization analyses were conducted to strengthen the interpretability of Sia-m7G and provide a further understanding of m7G site fragments in genomic sequences. Conclusion: Sia-m7G has significant advantages over other classifiers and predictors, which proves the superiority of the Siamese neural network algorithm in identifying m7G sites.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"12 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/0115748936285540240116065719","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: The chemical modification of RNA plays a crucial role in many biological processes. N7-methylguanosine (m7G), being one of the most important epigenetic modifications, plays an important role in gene expression, processing metabolism, and protein synthesis. Detecting the exact location of m7G sites in the transcriptome is key to understanding their relevant mechanism in gene expression. On the basis of experimentally validated data, several machine learning or deep learning tools have been designed to identify internal m7G sites and have shown advantages over traditional experimental methods in terms of speed, cost-effectiveness and robustness. Aims: In this study, we aim to develop a computational model to help predict the exact location of m7G sites in humans. Objective: Simple and advanced encoding methods and deep learning networks are designed to achieve excellent m7G prediction efficiently. Methods: Three types of feature extractions and six classification algorithms were tested to identify m7G sites. Our final model, named Sia-m7G, adopts one-hot encoding and a delicate Siamese neural network with an attention mechanism. In addition, multiple 10-fold cross-validation tests were conducted to evaluate our predictor. Results: Sia-m7G achieved the highest sensitivity, specificity and accuracy on 10-fold crossvalidation tests compared with the other six m7G predictors. Nucleotide preference and model visualization analyses were conducted to strengthen the interpretability of Sia-m7G and provide a further understanding of m7G site fragments in genomic sequences. Conclusion: Sia-m7G has significant advantages over other classifiers and predictors, which proves the superiority of the Siamese neural network algorithm in identifying m7G sites.

查看原文本刊更多论文

Sia-m7G：通过具有注意力机制的连体神经网络预测 m7G 位点

背景：RNA 的化学修饰在许多生物过程中起着至关重要的作用。N7-甲基鸟苷（m7G）是最重要的表观遗传修饰之一，在基因表达、加工代谢和蛋白质合成中发挥着重要作用。检测 m7G 位点在转录组中的确切位置是了解其在基因表达中的相关机制的关键。在实验验证数据的基础上，人们设计了一些机器学习或深度学习工具来识别内部的 m7G 位点，与传统的实验方法相比，这些工具在速度、成本效益和鲁棒性方面都显示出了优势。目的：在本研究中，我们旨在开发一种计算模型，帮助预测人类 m7G 位点的确切位置。目标：通过简单、先进的编码方法和深度分析技术，预测人类 m7G 位点的准确位置：设计简单而先进的编码方法和深度学习网络，以高效实现出色的 m7G 预测。方法：测试了三种特征提取和六种分类算法，以识别 m7G 位点。我们的最终模型被命名为 Sia-m7G，它采用了单次热编码和具有注意机制的精致连体神经网络。此外，我们还进行了多次 10 倍交叉验证测试，以评估我们的预测器。结果与其他六种 m7G 预测因子相比，Sia-m7G 在 10 倍交叉验证测试中的灵敏度、特异性和准确性都是最高的。进行了核苷酸偏好和模型可视化分析，以加强 Sia-m7G 的可解释性，并进一步了解基因组序列中的 m7G 位点片段。结论与其他分类器和预测器相比，Sia-m7G 具有显著优势，这证明了连体神经网络算法在识别 m7G 位点方面的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Current Bioinformatics 生物-生化研究方法

CiteScore

6.60

自引率

2.50%

发文量

审稿时长

>12 weeks

期刊介绍： Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth/mini-reviews, research papers and guest edited thematic issues written by leaders in the field, covering a wide range of the integration of biology with computer and information science. The journal focuses on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.