跨模态迁移学习的蛋白质衍生RNA语言模型。

IF 7.7
Cell systems Pub Date : 2025-09-17 Epub Date: 2025-08-22 DOI:10.1016/j.cels.2025.101371
Ruoxi Zhang, Ben Ma, Gang Xu, Jianpeng Ma
{"title":"跨模态迁移学习的蛋白质衍生RNA语言模型。","authors":"Ruoxi Zhang, Ben Ma, Gang Xu, Jianpeng Ma","doi":"10.1016/j.cels.2025.101371","DOIUrl":null,"url":null,"abstract":"<p><p>Protein language models (PLMs), such as the highly successful ESM-2, have proven particularly effective. However, language models designed for RNA continue to face challenges. A key question is as follows: can the information derived from PLMs be harnessed and transferred to RNA? To investigate this, a model termed ProtRNA has been developed by a cross-modality transfer learning strategy for addressing the challenges posed by RNA's limited and less conserved sequences. By leveraging the evolutionary and physicochemical information encoded in protein sequences, the ESM-2 model is adapted to processing \"low-resource\" RNA sequence data. The results show comparable or superior performance in various RNA downstream tasks, with only 1/8 the trainable parameters and 1/6 the training data employed by the primary reference baseline RNA language model. This approach highlights the potential of cross-modality transfer learning in biological language models.</p>","PeriodicalId":93929,"journal":{"name":"Cell systems","volume":" ","pages":"101371"},"PeriodicalIF":7.7000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ProtRNA: A protein-derived RNA language model by cross-modality transfer learning.\",\"authors\":\"Ruoxi Zhang, Ben Ma, Gang Xu, Jianpeng Ma\",\"doi\":\"10.1016/j.cels.2025.101371\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Protein language models (PLMs), such as the highly successful ESM-2, have proven particularly effective. However, language models designed for RNA continue to face challenges. A key question is as follows: can the information derived from PLMs be harnessed and transferred to RNA? To investigate this, a model termed ProtRNA has been developed by a cross-modality transfer learning strategy for addressing the challenges posed by RNA's limited and less conserved sequences. By leveraging the evolutionary and physicochemical information encoded in protein sequences, the ESM-2 model is adapted to processing \\\"low-resource\\\" RNA sequence data. The results show comparable or superior performance in various RNA downstream tasks, with only 1/8 the trainable parameters and 1/6 the training data employed by the primary reference baseline RNA language model. This approach highlights the potential of cross-modality transfer learning in biological language models.</p>\",\"PeriodicalId\":93929,\"journal\":{\"name\":\"Cell systems\",\"volume\":\" \",\"pages\":\"101371\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2025-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cell systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1016/j.cels.2025.101371\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/8/22 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cell systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.cels.2025.101371","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/22 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

蛋白质语言模型(plm),如非常成功的ESM-2,已被证明特别有效。然而,为RNA设计的语言模型继续面临挑战。一个关键问题如下:从PLMs中获得的信息能否被利用并转移到RNA中?为了研究这一点,一个被称为pronna的模型已经通过跨模态迁移学习策略开发,以解决RNA有限和不太保守的序列所带来的挑战。通过利用编码在蛋白质序列中的进化和物理化学信息,ESM-2模型适用于处理“低资源”RNA序列数据。结果显示,在主要参考基线RNA语言模型中,只有1/8的可训练参数和1/6的训练数据,在各种RNA下游任务中具有相当或更好的性能。这种方法强调了跨模态迁移学习在生物语言模型中的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
ProtRNA: A protein-derived RNA language model by cross-modality transfer learning.

Protein language models (PLMs), such as the highly successful ESM-2, have proven particularly effective. However, language models designed for RNA continue to face challenges. A key question is as follows: can the information derived from PLMs be harnessed and transferred to RNA? To investigate this, a model termed ProtRNA has been developed by a cross-modality transfer learning strategy for addressing the challenges posed by RNA's limited and less conserved sequences. By leveraging the evolutionary and physicochemical information encoded in protein sequences, the ESM-2 model is adapted to processing "low-resource" RNA sequence data. The results show comparable or superior performance in various RNA downstream tasks, with only 1/8 the trainable parameters and 1/6 the training data employed by the primary reference baseline RNA language model. This approach highlights the potential of cross-modality transfer learning in biological language models.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信