Learning Sequential and Structural Dependencies Between Nucleotides for RNA N6-Methyladenosine Site Identification

IF 15.3 1区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS
Guodong Li;Bowei Zhao;Xiaorui Su;Dongxu Li;Yue Yang;Zhi Zeng;Lun Hu
{"title":"Learning Sequential and Structural Dependencies Between Nucleotides for RNA N6-Methyladenosine Site Identification","authors":"Guodong Li;Bowei Zhao;Xiaorui Su;Dongxu Li;Yue Yang;Zhi Zeng;Lun Hu","doi":"10.1109/JAS.2024.124233","DOIUrl":null,"url":null,"abstract":"N6-methyladenosine (m6A) is an important RNA methylation modification involved in regulating diverse biological processes across multiple species. Hence, the identification of m6A modification sites provides valuable insight into the biological mechanisms of complex diseases at the post-transcriptional level. Although a variety of identification algorithms have been proposed recently, most of them capture the features of m6A modification sites by focusing on the sequential dependencies of nucleotides at different positions in RNA sequences, while ignoring the structural dependencies of nucleotides in their three-dimensional structures. To overcome this issue, we propose a cross-species end-to-end deep learning model, namely CR-NSSD, which conduct a cross-domain representation learning process integrating nucleotide structural and sequential dependencies for RNA m6A site identification. Specifically, CR-NSSD first obtains the pre-coded representations of RNA sequences by incorporating the position information into single-nucleotide states with chaos game representation theory. It then constructs a cross-domain reconstruction encoder to learn the sequential and structural dependencies between nucleotides. By minimizing the reconstruction and binary cross-entropy losses, CR-NSSD is trained to complete the task of m6A site identification. Extensive experiments have demonstrated the promising performance of CR-NSSD by comparing it with several state-of-the-art m6A identification algorithms. Moreover, the results of cross-species prediction indicate that the integration of sequential and structural dependencies allows CR-NSSD to capture general features of m6A modification sites among different species, thus improving the accuracy of cross-species identification.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"11 10","pages":"2123-2134"},"PeriodicalIF":15.3000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ieee-Caa Journal of Automatica Sinica","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10664519/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

N6-methyladenosine (m6A) is an important RNA methylation modification involved in regulating diverse biological processes across multiple species. Hence, the identification of m6A modification sites provides valuable insight into the biological mechanisms of complex diseases at the post-transcriptional level. Although a variety of identification algorithms have been proposed recently, most of them capture the features of m6A modification sites by focusing on the sequential dependencies of nucleotides at different positions in RNA sequences, while ignoring the structural dependencies of nucleotides in their three-dimensional structures. To overcome this issue, we propose a cross-species end-to-end deep learning model, namely CR-NSSD, which conduct a cross-domain representation learning process integrating nucleotide structural and sequential dependencies for RNA m6A site identification. Specifically, CR-NSSD first obtains the pre-coded representations of RNA sequences by incorporating the position information into single-nucleotide states with chaos game representation theory. It then constructs a cross-domain reconstruction encoder to learn the sequential and structural dependencies between nucleotides. By minimizing the reconstruction and binary cross-entropy losses, CR-NSSD is trained to complete the task of m6A site identification. Extensive experiments have demonstrated the promising performance of CR-NSSD by comparing it with several state-of-the-art m6A identification algorithms. Moreover, the results of cross-species prediction indicate that the integration of sequential and structural dependencies allows CR-NSSD to capture general features of m6A modification sites among different species, thus improving the accuracy of cross-species identification.
学习核苷酸之间的序列和结构依赖性以识别 RNA N6-甲基腺苷位点
N6-甲基腺苷(m6A)是一种重要的 RNA 甲基化修饰,参与调控多个物种的多种生物过程。因此,对 m6A 修饰位点的鉴定可在转录后水平为复杂疾病的生物学机制提供有价值的见解。虽然近来提出了多种识别算法,但它们大多只关注 RNA 序列中不同位置核苷酸的序列依赖关系,而忽略了核苷酸在其三维结构中的结构依赖关系,因而无法捕捉 m6A 修饰位点的特征。为了克服这一问题,我们提出了一种跨物种端到端深度学习模型,即CR-NSSD,该模型进行了跨域表征学习,将核苷酸结构依赖性和序列依赖性整合在一起,用于RNA m6A位点的识别。具体来说,CR-NSSD 首先利用混沌博弈表示理论将位置信息纳入单核苷酸状态,从而获得 RNA 序列的预编码表示。然后,它构建了一个跨域重构编码器,以学习核苷酸之间的序列和结构依赖关系。通过最小化重构损失和二元交叉熵损失,CR-NSSD 被训练来完成 m6A 位点识别任务。通过与几种最先进的 m6A 识别算法进行比较,大量实验证明了 CR-NSSD 的良好性能。此外,跨物种预测的结果表明,序列和结构依赖性的整合使 CR-NSSD 能够捕捉不同物种 m6A 修饰位点的一般特征,从而提高跨物种鉴定的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Ieee-Caa Journal of Automatica Sinica
Ieee-Caa Journal of Automatica Sinica Engineering-Control and Systems Engineering
CiteScore
23.50
自引率
11.00%
发文量
880
期刊介绍: The IEEE/CAA Journal of Automatica Sinica is a reputable journal that publishes high-quality papers in English on original theoretical/experimental research and development in the field of automation. The journal covers a wide range of topics including automatic control, artificial intelligence and intelligent control, systems theory and engineering, pattern recognition and intelligent systems, automation engineering and applications, information processing and information systems, network-based automation, robotics, sensing and measurement, and navigation, guidance, and control. Additionally, the journal is abstracted/indexed in several prominent databases including SCIE (Science Citation Index Expanded), EI (Engineering Index), Inspec, Scopus, SCImago, DBLP, CNKI (China National Knowledge Infrastructure), CSCD (Chinese Science Citation Database), and IEEE Xplore.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信