Sina Barazandeh, Furkan Ozden, Ahmet Hincer, Urartu Ozgur Safak Seker, A Ercument Cicek
{"title":"UTRGAN:学习生成5' UTR序列,优化翻译效率和基因表达。","authors":"Sina Barazandeh, Furkan Ozden, Ahmet Hincer, Urartu Ozgur Safak Seker, A Ercument Cicek","doi":"10.1093/bioadv/vbaf134","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>The 5' untranslated region (5' UTR) of mRNA is crucial for the molecule's translatability and stability, making it essential for designing synthetic biological circuits for high and stable protein expression. Several UTR sequences are patented and widely used in laboratories. This paper presents UTRGAN, a Generative Adversarial Network (GAN)-based model for generating 5' UTR sequences, coupled with an optimization procedure to ensure high expression for target gene sequences or high ribosome load and translation efficiency.</p><p><strong>Results: </strong>The model generates sequences mimicking various properties of natural UTR sequences and optimizes them to achieve (i) up to five-fold higher average predicted expression on target genes, (ii) up to two-fold higher predicted mean ribosome load, and (iii) a 34-fold higher average predicted translation efficiency compared to initial UTR sequences. UTRGAN-generated sequences also exhibit higher similarity to known regulatory motifs in regions such as internal ribosome entry sites, upstream open reading frames, G-quadruplexes, and Kozak and initiation start codon regions. <i>In-vitro</i> experiments show that the UTR sequences designed by UTRGAN result in a higher translation rate for the human TNF- <math><mi>α</mi></math> protein compared to the human Beta Globin 5' UTR, a UTR with high production capacity.</p><p><strong>Availability and implementation: </strong>The source code, including the model implementation and the optimization are released at http://github.com/ciceklab/UTRGAN. We downloaded the dataset from the UTRdb 2.0 database and available within the GitHub repository.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf134"},"PeriodicalIF":2.8000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12228966/pdf/","citationCount":"0","resultStr":"{\"title\":\"UTRGAN: learning to generate 5' UTR sequences for optimized translation efficiency and gene expression.\",\"authors\":\"Sina Barazandeh, Furkan Ozden, Ahmet Hincer, Urartu Ozgur Safak Seker, A Ercument Cicek\",\"doi\":\"10.1093/bioadv/vbaf134\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>The 5' untranslated region (5' UTR) of mRNA is crucial for the molecule's translatability and stability, making it essential for designing synthetic biological circuits for high and stable protein expression. Several UTR sequences are patented and widely used in laboratories. This paper presents UTRGAN, a Generative Adversarial Network (GAN)-based model for generating 5' UTR sequences, coupled with an optimization procedure to ensure high expression for target gene sequences or high ribosome load and translation efficiency.</p><p><strong>Results: </strong>The model generates sequences mimicking various properties of natural UTR sequences and optimizes them to achieve (i) up to five-fold higher average predicted expression on target genes, (ii) up to two-fold higher predicted mean ribosome load, and (iii) a 34-fold higher average predicted translation efficiency compared to initial UTR sequences. UTRGAN-generated sequences also exhibit higher similarity to known regulatory motifs in regions such as internal ribosome entry sites, upstream open reading frames, G-quadruplexes, and Kozak and initiation start codon regions. <i>In-vitro</i> experiments show that the UTR sequences designed by UTRGAN result in a higher translation rate for the human TNF- <math><mi>α</mi></math> protein compared to the human Beta Globin 5' UTR, a UTR with high production capacity.</p><p><strong>Availability and implementation: </strong>The source code, including the model implementation and the optimization are released at http://github.com/ciceklab/UTRGAN. We downloaded the dataset from the UTRdb 2.0 database and available within the GitHub repository.</p>\",\"PeriodicalId\":72368,\"journal\":{\"name\":\"Bioinformatics advances\",\"volume\":\"5 1\",\"pages\":\"vbaf134\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12228966/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioadv/vbaf134\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
UTRGAN: learning to generate 5' UTR sequences for optimized translation efficiency and gene expression.
Motivation: The 5' untranslated region (5' UTR) of mRNA is crucial for the molecule's translatability and stability, making it essential for designing synthetic biological circuits for high and stable protein expression. Several UTR sequences are patented and widely used in laboratories. This paper presents UTRGAN, a Generative Adversarial Network (GAN)-based model for generating 5' UTR sequences, coupled with an optimization procedure to ensure high expression for target gene sequences or high ribosome load and translation efficiency.
Results: The model generates sequences mimicking various properties of natural UTR sequences and optimizes them to achieve (i) up to five-fold higher average predicted expression on target genes, (ii) up to two-fold higher predicted mean ribosome load, and (iii) a 34-fold higher average predicted translation efficiency compared to initial UTR sequences. UTRGAN-generated sequences also exhibit higher similarity to known regulatory motifs in regions such as internal ribosome entry sites, upstream open reading frames, G-quadruplexes, and Kozak and initiation start codon regions. In-vitro experiments show that the UTR sequences designed by UTRGAN result in a higher translation rate for the human TNF- protein compared to the human Beta Globin 5' UTR, a UTR with high production capacity.
Availability and implementation: The source code, including the model implementation and the optimization are released at http://github.com/ciceklab/UTRGAN. We downloaded the dataset from the UTRdb 2.0 database and available within the GitHub repository.