RNACG: A Universal RNA Sequence Conditional Generation model based on Flow-Matching

Letian Gao, Zhi John Lu
{"title":"RNACG: A Universal RNA Sequence Conditional Generation model based on Flow-Matching","authors":"Letian Gao, Zhi John Lu","doi":"arxiv-2407.19838","DOIUrl":null,"url":null,"abstract":"RNA plays a crucial role in diverse life processes. In contrast to the rapid\nadvancement of protein design methods, the work related to RNA is more\ndemanding. Most current RNA design approaches concentrate on specified target\nattributes and rely on extensive experimental searches. However, these methods\nremain costly and inefficient due to practical limitations. In this paper, we\ncharacterize all sequence design issues as conditional generation tasks and\noffer parameterized representations for multiple problems. For these problems,\nwe have developed a universal RNA sequence generation model based on flow\nmatching, namely RNACG. RNACG can accommodate various conditional inputs and is\nportable, enabling users to customize the encoding network for conditional\ninputs as per their requirements and integrate it into the generation network.\nWe evaluated RNACG in RNA 3D structure inverse folding, 2D structure inverse\nfolding, family-specific sequence generation, and 5'UTR translation efficiency\nprediction. RNACG attains superior or competitive performance on these tasks\ncompared with other methods. RNACG exhibits extensive applicability in sequence\ngeneration and property prediction tasks, providing a novel approach to RNA\nsequence design and potential methods for simulation experiments with\nlarge-scale RNA sequence data.","PeriodicalId":501022,"journal":{"name":"arXiv - QuanBio - Biomolecules","volume":"3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Biomolecules","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.19838","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

RNA plays a crucial role in diverse life processes. In contrast to the rapid advancement of protein design methods, the work related to RNA is more demanding. Most current RNA design approaches concentrate on specified target attributes and rely on extensive experimental searches. However, these methods remain costly and inefficient due to practical limitations. In this paper, we characterize all sequence design issues as conditional generation tasks and offer parameterized representations for multiple problems. For these problems, we have developed a universal RNA sequence generation model based on flow matching, namely RNACG. RNACG can accommodate various conditional inputs and is portable, enabling users to customize the encoding network for conditional inputs as per their requirements and integrate it into the generation network. We evaluated RNACG in RNA 3D structure inverse folding, 2D structure inverse folding, family-specific sequence generation, and 5'UTR translation efficiency prediction. RNACG attains superior or competitive performance on these tasks compared with other methods. RNACG exhibits extensive applicability in sequence generation and property prediction tasks, providing a novel approach to RNA sequence design and potential methods for simulation experiments with large-scale RNA sequence data.
RNACG:基于流匹配的通用 RNA 序列条件生成模型
RNA 在各种生命过程中发挥着至关重要的作用。与蛋白质设计方法的快速发展相比,与 RNA 相关的工作要求更高。目前大多数 RNA 设计方法都集中在特定的目标属性上,并依赖于大量的实验搜索。然而,由于实际条件的限制,这些方法仍然成本高、效率低。在本文中,我们将所有序列设计问题描述为条件生成任务,并提供了多个问题的参数化表示。针对这些问题,我们开发了一种基于流匹配的通用 RNA 序列生成模型,即 RNACG。RNACG 可容纳各种条件输入,并且可移植,用户可根据自己的要求定制条件输入的编码网络,并将其集成到生成网络中。我们在 RNA 三维结构反折叠、二维结构反折叠、特定族序列生成和 5'UTR 翻译效率预测中对 RNACG 进行了评估。与其他方法相比,RNACG 在这些任务中取得了优异或有竞争力的性能。RNACG 在序列生成和性质预测任务中表现出广泛的适用性,为 RNA 序列设计提供了一种新方法,并为大规模 RNA 序列数据的模拟实验提供了潜在方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信