{"title":"A foundation language model to decipher diverse regulation of RNAs","authors":"Hanwen Zhou, Yue Hu, Yulong Zheng, Jiefu Li, Jielong Peng, Jiang Hu, Yun Yang, Wei Chen, Guoqing Zhang, Zefeng Wang","doi":"10.1186/s13059-025-03752-x","DOIUrl":null,"url":null,"abstract":"RNA metabolism is tightly regulated by cis-elements and trans-acting factors. Most information guiding such regulation is encoded in RNA sequences. Deciphering the regulatory rules is critical for RNA biology and therapeutics; however, the prediction of diverse regulation from RNA sequences remains a formidable challenge. Considering the similarities in semantic and syntactic features between RNAs and human language, we present LAMAR, a transformer-based foundation LAnguage Model for RNA Regulation, to decipher general rules underlying RNA processing. The model is pretrained on approximately 15 million sequences from both genome and transcriptome of 225 mammals and 1569 viruses, and further fine-tuned with labeled datasets for various tasks. The resulting fine-tuned models outperform the state-of-the-art methods in predicting mRNA translation efficiency and mRNA half-life, while achieving comparable accuracy to specifically designed methods in predicting splice sites of pre-mRNAs and internal ribosome entry sites (IRESs). The fine-tuned LAMAR is further applied to predict mutational effects of cis-regulatory elements and reveals known and novel regulatory elements that modulate RNA degradation. The fine-tuned LAMAR is also applied in an in silico screen of novel IRESs, resulting in the identifications of highly active IRESs that promote circRNA translation. Our results indicate that a single foundation language model is applicable in the comprehensive analysis of different aspects of RNA regulation and predictive identification of novel regulatory elements, providing new insight into the design and optimization of RNA drugs.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"38 1","pages":""},"PeriodicalIF":10.1000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13059-025-03752-x","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
RNA metabolism is tightly regulated by cis-elements and trans-acting factors. Most information guiding such regulation is encoded in RNA sequences. Deciphering the regulatory rules is critical for RNA biology and therapeutics; however, the prediction of diverse regulation from RNA sequences remains a formidable challenge. Considering the similarities in semantic and syntactic features between RNAs and human language, we present LAMAR, a transformer-based foundation LAnguage Model for RNA Regulation, to decipher general rules underlying RNA processing. The model is pretrained on approximately 15 million sequences from both genome and transcriptome of 225 mammals and 1569 viruses, and further fine-tuned with labeled datasets for various tasks. The resulting fine-tuned models outperform the state-of-the-art methods in predicting mRNA translation efficiency and mRNA half-life, while achieving comparable accuracy to specifically designed methods in predicting splice sites of pre-mRNAs and internal ribosome entry sites (IRESs). The fine-tuned LAMAR is further applied to predict mutational effects of cis-regulatory elements and reveals known and novel regulatory elements that modulate RNA degradation. The fine-tuned LAMAR is also applied in an in silico screen of novel IRESs, resulting in the identifications of highly active IRESs that promote circRNA translation. Our results indicate that a single foundation language model is applicable in the comprehensive analysis of different aspects of RNA regulation and predictive identification of novel regulatory elements, providing new insight into the design and optimization of RNA drugs.
Genome BiologyBiochemistry, Genetics and Molecular Biology-Genetics
CiteScore
21.00
自引率
3.30%
发文量
241
审稿时长
2 months
期刊介绍:
Genome Biology stands as a premier platform for exceptional research across all domains of biology and biomedicine, explored through a genomic and post-genomic lens.
With an impressive impact factor of 12.3 (2022),* the journal secures its position as the 3rd-ranked research journal in the Genetics and Heredity category and the 2nd-ranked research journal in the Biotechnology and Applied Microbiology category by Thomson Reuters. Notably, Genome Biology holds the distinction of being the highest-ranked open-access journal in this category.
Our dedicated team of highly trained in-house Editors collaborates closely with our esteemed Editorial Board of international experts, ensuring the journal remains on the forefront of scientific advances and community standards. Regular engagement with researchers at conferences and institute visits underscores our commitment to staying abreast of the latest developments in the field.