{"title":"TVAE-RNA: Ensemble-Based RNA Secondary Structure Prediction via Transformer Variational Autoencoders.","authors":"Xiyuan Mei, Hanbo Liu, Yuheng Zhu, Enshuang Zhao, Longyi Li, Hao Zhang","doi":"10.1093/bioinformatics/btaf527","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Accurate prediction of RNA secondary structure remains challenging due to the presence of pseudoknots, long-range dependencies, and limited labeled data.</p><p><strong>Results: </strong>We propose TVAE, a novel framework that integrates a Transformer encoder with a Variational Autoencoder (VAE). The Transformer captures global dependencies in the sequence, while the VAE models structural variability by learning a probabilistic latent space. Unlike deterministic models, TVAE generates diverse and biologically plausible secondary structures, enabling more comprehensive structure discovery. To obtain discrete predictions, we introduce GHA-Pairing, a fast and biologically constrained base-pairing algorithm. TVAE demonstrates strong generalization across different RNA families and achieves state-of-the-art performance on benchmark datasets, reaching an F1 score of 0.89 and 83% accuracy, surpassing existing methods by 10%. These results highlight the advantage of probabilistic modeling for RNA structure prediction and its potential to enhance biological insights.</p><p><strong>Availability and impiementation: </strong>Code and pretrained models are available at https://github.com/mei-rna/TVAE-RNA.The released version of the dataset and models can also be accessed via DOI: 10.5281/zenodo.16946114.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf527","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Motivation: Accurate prediction of RNA secondary structure remains challenging due to the presence of pseudoknots, long-range dependencies, and limited labeled data.
Results: We propose TVAE, a novel framework that integrates a Transformer encoder with a Variational Autoencoder (VAE). The Transformer captures global dependencies in the sequence, while the VAE models structural variability by learning a probabilistic latent space. Unlike deterministic models, TVAE generates diverse and biologically plausible secondary structures, enabling more comprehensive structure discovery. To obtain discrete predictions, we introduce GHA-Pairing, a fast and biologically constrained base-pairing algorithm. TVAE demonstrates strong generalization across different RNA families and achieves state-of-the-art performance on benchmark datasets, reaching an F1 score of 0.89 and 83% accuracy, surpassing existing methods by 10%. These results highlight the advantage of probabilistic modeling for RNA structure prediction and its potential to enhance biological insights.
Availability and impiementation: Code and pretrained models are available at https://github.com/mei-rna/TVAE-RNA.The released version of the dataset and models can also be accessed via DOI: 10.5281/zenodo.16946114.