UTRGAN: learning to generate 5' UTR sequences for optimized translation efficiency and gene expression.

IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Bioinformatics advances Pub Date : 2025-06-10 eCollection Date: 2025-01-01 DOI:10.1093/bioadv/vbaf134
Sina Barazandeh, Furkan Ozden, Ahmet Hincer, Urartu Ozgur Safak Seker, A Ercument Cicek
{"title":"UTRGAN: learning to generate 5' UTR sequences for optimized translation efficiency and gene expression.","authors":"Sina Barazandeh, Furkan Ozden, Ahmet Hincer, Urartu Ozgur Safak Seker, A Ercument Cicek","doi":"10.1093/bioadv/vbaf134","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>The 5' untranslated region (5' UTR) of mRNA is crucial for the molecule's translatability and stability, making it essential for designing synthetic biological circuits for high and stable protein expression. Several UTR sequences are patented and widely used in laboratories. This paper presents UTRGAN, a Generative Adversarial Network (GAN)-based model for generating 5' UTR sequences, coupled with an optimization procedure to ensure high expression for target gene sequences or high ribosome load and translation efficiency.</p><p><strong>Results: </strong>The model generates sequences mimicking various properties of natural UTR sequences and optimizes them to achieve (i) up to five-fold higher average predicted expression on target genes, (ii) up to two-fold higher predicted mean ribosome load, and (iii) a 34-fold higher average predicted translation efficiency compared to initial UTR sequences. UTRGAN-generated sequences also exhibit higher similarity to known regulatory motifs in regions such as internal ribosome entry sites, upstream open reading frames, G-quadruplexes, and Kozak and initiation start codon regions. <i>In-vitro</i> experiments show that the UTR sequences designed by UTRGAN result in a higher translation rate for the human TNF- <math><mi>α</mi></math> protein compared to the human Beta Globin 5' UTR, a UTR with high production capacity.</p><p><strong>Availability and implementation: </strong>The source code, including the model implementation and the optimization are released at http://github.com/ciceklab/UTRGAN. We downloaded the dataset from the UTRdb 2.0 database and available within the GitHub repository.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf134"},"PeriodicalIF":2.8000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12228966/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: The 5' untranslated region (5' UTR) of mRNA is crucial for the molecule's translatability and stability, making it essential for designing synthetic biological circuits for high and stable protein expression. Several UTR sequences are patented and widely used in laboratories. This paper presents UTRGAN, a Generative Adversarial Network (GAN)-based model for generating 5' UTR sequences, coupled with an optimization procedure to ensure high expression for target gene sequences or high ribosome load and translation efficiency.

Results: The model generates sequences mimicking various properties of natural UTR sequences and optimizes them to achieve (i) up to five-fold higher average predicted expression on target genes, (ii) up to two-fold higher predicted mean ribosome load, and (iii) a 34-fold higher average predicted translation efficiency compared to initial UTR sequences. UTRGAN-generated sequences also exhibit higher similarity to known regulatory motifs in regions such as internal ribosome entry sites, upstream open reading frames, G-quadruplexes, and Kozak and initiation start codon regions. In-vitro experiments show that the UTR sequences designed by UTRGAN result in a higher translation rate for the human TNF- α protein compared to the human Beta Globin 5' UTR, a UTR with high production capacity.

Availability and implementation: The source code, including the model implementation and the optimization are released at http://github.com/ciceklab/UTRGAN. We downloaded the dataset from the UTRdb 2.0 database and available within the GitHub repository.

UTRGAN:学习生成5' UTR序列,优化翻译效率和基因表达。
动机:mRNA的5‘非翻译区(5’ UTR)对分子的可翻译性和稳定性至关重要,对于设计高稳定蛋白表达的合成生物电路至关重要。一些UTR序列已获得专利并在实验室中广泛使用。本文提出了一种基于生成对抗网络(GAN)的5' UTR序列生成模型UTRGAN,并结合了优化程序,以确保目标基因序列的高表达或高核糖体负载和翻译效率。结果:该模型生成的序列模拟了天然UTR序列的各种特性,并对其进行了优化,以实现(i)与初始UTR序列相比,在靶基因上的平均预测表达量提高了5倍,(ii)平均预测核糖体负荷提高了2倍,(iii)平均预测翻译效率提高了34倍。utrgan生成的序列在内部核糖体进入位点、上游开放阅读框、g -四联体、Kozak和起始起始密码子区域等区域与已知的调控基序具有更高的相似性。体外实验表明,UTRGAN设计的UTR序列对人TNF- α蛋白的翻译率高于人β -球蛋白5' UTR,这是一种具有高生产能力的UTR。可用性和实现:源代码,包括模型实现和优化在http://github.com/ciceklab/UTRGAN上发布。我们从UTRdb 2.0数据库下载了数据集,并在GitHub存储库中可用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.60
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信