{"title":"TVAE-RNA:基于集成的RNA二级结构预测,通过变压器变分自编码器。","authors":"Xiyuan Mei, Hanbo Liu, Yuheng Zhu, Enshuang Zhao, Longyi Li, Hao Zhang","doi":"10.1093/bioinformatics/btaf527","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Accurate prediction of RNA secondary structure remains challenging due to the presence of pseudoknots, long-range dependencies, and limited labeled data.</p><p><strong>Results: </strong>We propose TVAE, a novel framework that integrates a Transformer encoder with a Variational Autoencoder (VAE). The Transformer captures global dependencies in the sequence, while the VAE models structural variability by learning a probabilistic latent space. Unlike deterministic models, TVAE generates diverse and biologically plausible secondary structures, enabling more comprehensive structure discovery. To obtain discrete predictions, we introduce GHA-Pairing, a fast and biologically constrained base-pairing algorithm. TVAE demonstrates strong generalization across different RNA families and achieves state-of-the-art performance on benchmark datasets, reaching an F1 score of 0.89 and 83% accuracy, surpassing existing methods by 10%. These results highlight the advantage of probabilistic modeling for RNA structure prediction and its potential to enhance biological insights.</p><p><strong>Availability and impiementation: </strong>Code and pretrained models are available at https://github.com/mei-rna/TVAE-RNA.The released version of the dataset and models can also be accessed via DOI: 10.5281/zenodo.16946114.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TVAE-RNA: Ensemble-Based RNA Secondary Structure Prediction via Transformer Variational Autoencoders.\",\"authors\":\"Xiyuan Mei, Hanbo Liu, Yuheng Zhu, Enshuang Zhao, Longyi Li, Hao Zhang\",\"doi\":\"10.1093/bioinformatics/btaf527\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>Accurate prediction of RNA secondary structure remains challenging due to the presence of pseudoknots, long-range dependencies, and limited labeled data.</p><p><strong>Results: </strong>We propose TVAE, a novel framework that integrates a Transformer encoder with a Variational Autoencoder (VAE). The Transformer captures global dependencies in the sequence, while the VAE models structural variability by learning a probabilistic latent space. Unlike deterministic models, TVAE generates diverse and biologically plausible secondary structures, enabling more comprehensive structure discovery. To obtain discrete predictions, we introduce GHA-Pairing, a fast and biologically constrained base-pairing algorithm. TVAE demonstrates strong generalization across different RNA families and achieves state-of-the-art performance on benchmark datasets, reaching an F1 score of 0.89 and 83% accuracy, surpassing existing methods by 10%. These results highlight the advantage of probabilistic modeling for RNA structure prediction and its potential to enhance biological insights.</p><p><strong>Availability and impiementation: </strong>Code and pretrained models are available at https://github.com/mei-rna/TVAE-RNA.The released version of the dataset and models can also be accessed via DOI: 10.5281/zenodo.16946114.</p>\",\"PeriodicalId\":93899,\"journal\":{\"name\":\"Bioinformatics (Oxford, England)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics (Oxford, England)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioinformatics/btaf527\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf527","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
TVAE-RNA: Ensemble-Based RNA Secondary Structure Prediction via Transformer Variational Autoencoders.
Motivation: Accurate prediction of RNA secondary structure remains challenging due to the presence of pseudoknots, long-range dependencies, and limited labeled data.
Results: We propose TVAE, a novel framework that integrates a Transformer encoder with a Variational Autoencoder (VAE). The Transformer captures global dependencies in the sequence, while the VAE models structural variability by learning a probabilistic latent space. Unlike deterministic models, TVAE generates diverse and biologically plausible secondary structures, enabling more comprehensive structure discovery. To obtain discrete predictions, we introduce GHA-Pairing, a fast and biologically constrained base-pairing algorithm. TVAE demonstrates strong generalization across different RNA families and achieves state-of-the-art performance on benchmark datasets, reaching an F1 score of 0.89 and 83% accuracy, surpassing existing methods by 10%. These results highlight the advantage of probabilistic modeling for RNA structure prediction and its potential to enhance biological insights.
Availability and impiementation: Code and pretrained models are available at https://github.com/mei-rna/TVAE-RNA.The released version of the dataset and models can also be accessed via DOI: 10.5281/zenodo.16946114.