{"title":"RNACG: A Universal RNA Sequence Conditional Generation model based on Flow-Matching","authors":"Letian Gao, Zhi John Lu","doi":"arxiv-2407.19838","DOIUrl":null,"url":null,"abstract":"RNA plays a crucial role in diverse life processes. In contrast to the rapid\nadvancement of protein design methods, the work related to RNA is more\ndemanding. Most current RNA design approaches concentrate on specified target\nattributes and rely on extensive experimental searches. However, these methods\nremain costly and inefficient due to practical limitations. In this paper, we\ncharacterize all sequence design issues as conditional generation tasks and\noffer parameterized representations for multiple problems. For these problems,\nwe have developed a universal RNA sequence generation model based on flow\nmatching, namely RNACG. RNACG can accommodate various conditional inputs and is\nportable, enabling users to customize the encoding network for conditional\ninputs as per their requirements and integrate it into the generation network.\nWe evaluated RNACG in RNA 3D structure inverse folding, 2D structure inverse\nfolding, family-specific sequence generation, and 5'UTR translation efficiency\nprediction. RNACG attains superior or competitive performance on these tasks\ncompared with other methods. RNACG exhibits extensive applicability in sequence\ngeneration and property prediction tasks, providing a novel approach to RNA\nsequence design and potential methods for simulation experiments with\nlarge-scale RNA sequence data.","PeriodicalId":501022,"journal":{"name":"arXiv - QuanBio - Biomolecules","volume":"3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Biomolecules","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.19838","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
RNA plays a crucial role in diverse life processes. In contrast to the rapid
advancement of protein design methods, the work related to RNA is more
demanding. Most current RNA design approaches concentrate on specified target
attributes and rely on extensive experimental searches. However, these methods
remain costly and inefficient due to practical limitations. In this paper, we
characterize all sequence design issues as conditional generation tasks and
offer parameterized representations for multiple problems. For these problems,
we have developed a universal RNA sequence generation model based on flow
matching, namely RNACG. RNACG can accommodate various conditional inputs and is
portable, enabling users to customize the encoding network for conditional
inputs as per their requirements and integrate it into the generation network.
We evaluated RNACG in RNA 3D structure inverse folding, 2D structure inverse
folding, family-specific sequence generation, and 5'UTR translation efficiency
prediction. RNACG attains superior or competitive performance on these tasks
compared with other methods. RNACG exhibits extensive applicability in sequence
generation and property prediction tasks, providing a novel approach to RNA
sequence design and potential methods for simulation experiments with
large-scale RNA sequence data.