Learning Sequential and Structural Dependencies Between Nucleotides for RNA N6-Methyladenosine Site Identification

IF 15.3 1区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Ieee-Caa Journal of Automatica Sinica Pub Date : 2024-09-04 DOI:10.1109/JAS.2024.124233

Guodong Li;Bowei Zhao;Xiaorui Su;Dongxu Li;Yue Yang;Zhi Zeng;Lun Hu

{"title":"Learning Sequential and Structural Dependencies Between Nucleotides for RNA N6-Methyladenosine Site Identification","authors":"Guodong Li;Bowei Zhao;Xiaorui Su;Dongxu Li;Yue Yang;Zhi Zeng;Lun Hu","doi":"10.1109/JAS.2024.124233","DOIUrl":null,"url":null,"abstract":"N6-methyladenosine (m6A) is an important RNA methylation modification involved in regulating diverse biological processes across multiple species. Hence, the identification of m6A modification sites provides valuable insight into the biological mechanisms of complex diseases at the post-transcriptional level. Although a variety of identification algorithms have been proposed recently, most of them capture the features of m6A modification sites by focusing on the sequential dependencies of nucleotides at different positions in RNA sequences, while ignoring the structural dependencies of nucleotides in their three-dimensional structures. To overcome this issue, we propose a cross-species end-to-end deep learning model, namely CR-NSSD, which conduct a cross-domain representation learning process integrating nucleotide structural and sequential dependencies for RNA m6A site identification. Specifically, CR-NSSD first obtains the pre-coded representations of RNA sequences by incorporating the position information into single-nucleotide states with chaos game representation theory. It then constructs a cross-domain reconstruction encoder to learn the sequential and structural dependencies between nucleotides. By minimizing the reconstruction and binary cross-entropy losses, CR-NSSD is trained to complete the task of m6A site identification. Extensive experiments have demonstrated the promising performance of CR-NSSD by comparing it with several state-of-the-art m6A identification algorithms. Moreover, the results of cross-species prediction indicate that the integration of sequential and structural dependencies allows CR-NSSD to capture general features of m6A modification sites among different species, thus improving the accuracy of cross-species identification.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"11 10","pages":"2123-2134"},"PeriodicalIF":15.3000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ieee-Caa Journal of Automatica Sinica","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10664519/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

N6-methyladenosine (m6A) is an important RNA methylation modification involved in regulating diverse biological processes across multiple species. Hence, the identification of m6A modification sites provides valuable insight into the biological mechanisms of complex diseases at the post-transcriptional level. Although a variety of identification algorithms have been proposed recently, most of them capture the features of m6A modification sites by focusing on the sequential dependencies of nucleotides at different positions in RNA sequences, while ignoring the structural dependencies of nucleotides in their three-dimensional structures. To overcome this issue, we propose a cross-species end-to-end deep learning model, namely CR-NSSD, which conduct a cross-domain representation learning process integrating nucleotide structural and sequential dependencies for RNA m6A site identification. Specifically, CR-NSSD first obtains the pre-coded representations of RNA sequences by incorporating the position information into single-nucleotide states with chaos game representation theory. It then constructs a cross-domain reconstruction encoder to learn the sequential and structural dependencies between nucleotides. By minimizing the reconstruction and binary cross-entropy losses, CR-NSSD is trained to complete the task of m6A site identification. Extensive experiments have demonstrated the promising performance of CR-NSSD by comparing it with several state-of-the-art m6A identification algorithms. Moreover, the results of cross-species prediction indicate that the integration of sequential and structural dependencies allows CR-NSSD to capture general features of m6A modification sites among different species, thus improving the accuracy of cross-species identification.

查看原文本刊更多论文

学习核苷酸之间的序列和结构依赖性以识别 RNA N6-甲基腺苷位点

N6-甲基腺苷（m6A）是一种重要的 RNA 甲基化修饰，参与调控多个物种的多种生物过程。因此，对 m6A 修饰位点的鉴定可在转录后水平为复杂疾病的生物学机制提供有价值的见解。虽然近来提出了多种识别算法，但它们大多只关注 RNA 序列中不同位置核苷酸的序列依赖关系，而忽略了核苷酸在其三维结构中的结构依赖关系，因而无法捕捉 m6A 修饰位点的特征。为了克服这一问题，我们提出了一种跨物种端到端深度学习模型，即CR-NSSD，该模型进行了跨域表征学习，将核苷酸结构依赖性和序列依赖性整合在一起，用于RNA m6A位点的识别。具体来说，CR-NSSD 首先利用混沌博弈表示理论将位置信息纳入单核苷酸状态，从而获得 RNA 序列的预编码表示。然后，它构建了一个跨域重构编码器，以学习核苷酸之间的序列和结构依赖关系。通过最小化重构损失和二元交叉熵损失，CR-NSSD 被训练来完成 m6A 位点识别任务。通过与几种最先进的 m6A 识别算法进行比较，大量实验证明了 CR-NSSD 的良好性能。此外，跨物种预测的结果表明，序列和结构依赖性的整合使 CR-NSSD 能够捕捉不同物种 m6A 修饰位点的一般特征，从而提高跨物种鉴定的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Ieee-Caa Journal of Automatica Sinica Engineering-Control and Systems Engineering

CiteScore

23.50

自引率

11.00%

发文量

880

期刊介绍： The IEEE/CAA Journal of Automatica Sinica is a reputable journal that publishes high-quality papers in English on original theoretical/experimental research and development in the field of automation. The journal covers a wide range of topics including automatic control, artificial intelligence and intelligent control, systems theory and engineering, pattern recognition and intelligent systems, automation engineering and applications, information processing and information systems, network-based automation, robotics, sensing and measurement, and navigation, guidance, and control. Additionally, the journal is abstracted/indexed in several prominent databases including SCIE (Science Citation Index Expanded), EI (Engineering Index), Inspec, Scopus, SCImago, DBLP, CNKI (China National Knowledge Infrastructure), CSCD (Chinese Science Citation Database), and IEEE Xplore.