REDalign: accurate RNA structural alignment using residual encoder-decoder network.

IF 2.9 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2024-11-05 DOI:10.1186/s12859-024-05956-7

Chun-Chi Chen, Yi-Ming Chan, Hyundoo Jeong

{"title":"REDalign: accurate RNA structural alignment using residual encoder-decoder network.","authors":"Chun-Chi Chen, Yi-Ming Chan, Hyundoo Jeong","doi":"10.1186/s12859-024-05956-7","DOIUrl":null,"url":null,"abstract":"Background: RNA secondary structural alignment serves as a foundational procedure in identifying conserved structural motifs among RNA sequences, crucially advancing our understanding of novel RNAs via comparative genomic analysis. While various computational strategies for RNA structural alignment exist, they often come with high computational complexity. Specifically, when addressing a set of RNAs with unknown structures, the task of simultaneously predicting their consensus secondary structure and determining the optimal sequence alignment requires an overwhelming computational effort of <math><mrow><mi>O</mi> <mo>(</mo> <msup><mi>L</mi> <mn>6</mn></msup> <mo>)</mo></mrow> </math> for each RNA pair. Such an extremely high computational complexity makes these methods impractical for large-scale analysis despite their accurate alignment capabilities.Results: In this paper, we introduce REDalign, an innovative approach based on deep learning for RNA secondary structural alignment. By utilizing a residual encoder-decoder network, REDalign can efficiently capture consensus structures and optimize structural alignments. In this learning model, the encoder network leverages a hierarchical pyramid to assimilate high-level structural features. Concurrently, the decoder network, enhanced with residual skip connections, integrates multi-level encoded features to learn detailed feature hierarchies with fewer parameter sets. REDalign significantly reduces computational complexity compared to Sankoff-style algorithms and effectively handles non-nested structures, including pseudoknots, which are challenging for traditional alignment methods. Extensive evaluations demonstrate that REDalign provides superior accuracy and substantial computational efficiency.Conclusion: REDalign presents a significant advancement in RNA secondary structural alignment, balancing high alignment accuracy with lower computational demands. Its ability to handle complex RNA structures, including pseudoknots, makes it an effective tool for large-scale RNA analysis, with potential implications for accelerating discoveries in RNA research and comparative genomics.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"346"},"PeriodicalIF":2.9000,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11539752/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-024-05956-7","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: RNA secondary structural alignment serves as a foundational procedure in identifying conserved structural motifs among RNA sequences, crucially advancing our understanding of novel RNAs via comparative genomic analysis. While various computational strategies for RNA structural alignment exist, they often come with high computational complexity. Specifically, when addressing a set of RNAs with unknown structures, the task of simultaneously predicting their consensus secondary structure and determining the optimal sequence alignment requires an overwhelming computational effort of $O (L^{6})$ for each RNA pair. Such an extremely high computational complexity makes these methods impractical for large-scale analysis despite their accurate alignment capabilities.

Results: In this paper, we introduce REDalign, an innovative approach based on deep learning for RNA secondary structural alignment. By utilizing a residual encoder-decoder network, REDalign can efficiently capture consensus structures and optimize structural alignments. In this learning model, the encoder network leverages a hierarchical pyramid to assimilate high-level structural features. Concurrently, the decoder network, enhanced with residual skip connections, integrates multi-level encoded features to learn detailed feature hierarchies with fewer parameter sets. REDalign significantly reduces computational complexity compared to Sankoff-style algorithms and effectively handles non-nested structures, including pseudoknots, which are challenging for traditional alignment methods. Extensive evaluations demonstrate that REDalign provides superior accuracy and substantial computational efficiency.

Conclusion: REDalign presents a significant advancement in RNA secondary structural alignment, balancing high alignment accuracy with lower computational demands. Its ability to handle complex RNA structures, including pseudoknots, makes it an effective tool for large-scale RNA analysis, with potential implications for accelerating discoveries in RNA research and comparative genomics.

查看原文本刊更多论文

REDalign：利用残差编码器-解码器网络进行精确的 RNA 结构配准。

背景：RNA 二级结构比对是识别 RNA 序列中保守结构模式的基础程序，可通过比较基因组分析加深我们对新型 RNA 的理解。虽然存在各种用于 RNA 结构比对的计算策略，但它们往往具有很高的计算复杂性。具体来说，在处理一组结构未知的 RNA 时，同时预测它们的共识二级结构和确定最佳序列比对的任务需要对每对 RNA 进行 O ( L 6 ) 的计算。这样极高的计算复杂度使得这些方法尽管具有精确的比对能力，但在大规模分析中并不实用：在本文中，我们介绍了 REDalign，一种基于深度学习的 RNA 二级结构配准创新方法。通过利用残差编码器-解码器网络，REDalign 可以有效捕捉共识结构并优化结构配准。在这种学习模型中，编码器网络利用分层金字塔吸收高级结构特征。同时，解码器网络通过残余跳转连接进行增强，整合多层次编码特征，以更少的参数集学习详细的特征层次。与 Sankoff 算法相比，REDalign 大大降低了计算复杂度，并能有效处理非嵌套结构，包括对传统配准方法具有挑战性的伪节点。广泛的评估结果表明，REDalign 具有卓越的准确性和可观的计算效率：REDalign 在 RNA 二级结构配准方面取得了重大进展，在高配准精度和低计算需求之间实现了平衡。REDalign 能够处理复杂的 RNA 结构（包括假结点），是进行大规模 RNA 分析的有效工具，对加速 RNA 研究和比较基因组学的发现具有潜在意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.