REDalign: accurate RNA structural alignment using residual encoder-decoder network.

IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Chun-Chi Chen, Yi-Ming Chan, Hyundoo Jeong
{"title":"REDalign: accurate RNA structural alignment using residual encoder-decoder network.","authors":"Chun-Chi Chen, Yi-Ming Chan, Hyundoo Jeong","doi":"10.1186/s12859-024-05956-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>RNA secondary structural alignment serves as a foundational procedure in identifying conserved structural motifs among RNA sequences, crucially advancing our understanding of novel RNAs via comparative genomic analysis. While various computational strategies for RNA structural alignment exist, they often come with high computational complexity. Specifically, when addressing a set of RNAs with unknown structures, the task of simultaneously predicting their consensus secondary structure and determining the optimal sequence alignment requires an overwhelming computational effort of <math><mrow><mi>O</mi> <mo>(</mo> <msup><mi>L</mi> <mn>6</mn></msup> <mo>)</mo></mrow> </math> for each RNA pair. Such an extremely high computational complexity makes these methods impractical for large-scale analysis despite their accurate alignment capabilities.</p><p><strong>Results: </strong>In this paper, we introduce REDalign, an innovative approach based on deep learning for RNA secondary structural alignment. By utilizing a residual encoder-decoder network, REDalign can efficiently capture consensus structures and optimize structural alignments. In this learning model, the encoder network leverages a hierarchical pyramid to assimilate high-level structural features. Concurrently, the decoder network, enhanced with residual skip connections, integrates multi-level encoded features to learn detailed feature hierarchies with fewer parameter sets. REDalign significantly reduces computational complexity compared to Sankoff-style algorithms and effectively handles non-nested structures, including pseudoknots, which are challenging for traditional alignment methods. Extensive evaluations demonstrate that REDalign provides superior accuracy and substantial computational efficiency.</p><p><strong>Conclusion: </strong>REDalign presents a significant advancement in RNA secondary structural alignment, balancing high alignment accuracy with lower computational demands. Its ability to handle complex RNA structures, including pseudoknots, makes it an effective tool for large-scale RNA analysis, with potential implications for accelerating discoveries in RNA research and comparative genomics.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"346"},"PeriodicalIF":2.9000,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11539752/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-024-05956-7","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: RNA secondary structural alignment serves as a foundational procedure in identifying conserved structural motifs among RNA sequences, crucially advancing our understanding of novel RNAs via comparative genomic analysis. While various computational strategies for RNA structural alignment exist, they often come with high computational complexity. Specifically, when addressing a set of RNAs with unknown structures, the task of simultaneously predicting their consensus secondary structure and determining the optimal sequence alignment requires an overwhelming computational effort of O ( L 6 ) for each RNA pair. Such an extremely high computational complexity makes these methods impractical for large-scale analysis despite their accurate alignment capabilities.

Results: In this paper, we introduce REDalign, an innovative approach based on deep learning for RNA secondary structural alignment. By utilizing a residual encoder-decoder network, REDalign can efficiently capture consensus structures and optimize structural alignments. In this learning model, the encoder network leverages a hierarchical pyramid to assimilate high-level structural features. Concurrently, the decoder network, enhanced with residual skip connections, integrates multi-level encoded features to learn detailed feature hierarchies with fewer parameter sets. REDalign significantly reduces computational complexity compared to Sankoff-style algorithms and effectively handles non-nested structures, including pseudoknots, which are challenging for traditional alignment methods. Extensive evaluations demonstrate that REDalign provides superior accuracy and substantial computational efficiency.

Conclusion: REDalign presents a significant advancement in RNA secondary structural alignment, balancing high alignment accuracy with lower computational demands. Its ability to handle complex RNA structures, including pseudoknots, makes it an effective tool for large-scale RNA analysis, with potential implications for accelerating discoveries in RNA research and comparative genomics.

REDalign:利用残差编码器-解码器网络进行精确的 RNA 结构配准。
背景:RNA 二级结构比对是识别 RNA 序列中保守结构模式的基础程序,可通过比较基因组分析加深我们对新型 RNA 的理解。虽然存在各种用于 RNA 结构比对的计算策略,但它们往往具有很高的计算复杂性。具体来说,在处理一组结构未知的 RNA 时,同时预测它们的共识二级结构和确定最佳序列比对的任务需要对每对 RNA 进行 O ( L 6 ) 的计算。这样极高的计算复杂度使得这些方法尽管具有精确的比对能力,但在大规模分析中并不实用:在本文中,我们介绍了 REDalign,一种基于深度学习的 RNA 二级结构配准创新方法。通过利用残差编码器-解码器网络,REDalign 可以有效捕捉共识结构并优化结构配准。在这种学习模型中,编码器网络利用分层金字塔吸收高级结构特征。同时,解码器网络通过残余跳转连接进行增强,整合多层次编码特征,以更少的参数集学习详细的特征层次。与 Sankoff 算法相比,REDalign 大大降低了计算复杂度,并能有效处理非嵌套结构,包括对传统配准方法具有挑战性的伪节点。广泛的评估结果表明,REDalign 具有卓越的准确性和可观的计算效率:REDalign 在 RNA 二级结构配准方面取得了重大进展,在高配准精度和低计算需求之间实现了平衡。REDalign 能够处理复杂的 RNA 结构(包括假结点),是进行大规模 RNA 分析的有效工具,对加速 RNA 研究和比较基因组学的发现具有潜在意义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Bioinformatics
BMC Bioinformatics 生物-生化研究方法
CiteScore
5.70
自引率
3.30%
发文量
506
审稿时长
4.3 months
期刊介绍: BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信