{"title":"Machine learning-based analysis of the impact of 5' untranslated region on protein expression.","authors":"Linfeng Wang,Sujia Liu,Jia-Xin Huang,Haifeng Zhu,Shuyu Li,Yannan Li,Sen Chen,Jianying Han,Yin Zhu,Jiahao Wu,Wentao Liao,Hongmei Zhang,Haiyan Zeng,Shaoting Li,Shuping Zhao,Bingwei Wang,Jiaqi Lin,Ji Zeng","doi":"10.1093/nar/gkaf861","DOIUrl":null,"url":null,"abstract":"The 5' untranslated region (5'UTR) plays a crucial regulatory role in messenger RNA (mRNA), with modified 5'UTRs extensively utilized in vaccine production, gene therapy, etc. Nevertheless, manually optimizing 5'UTRs may encounter difficulties in balancing the effects of various cis-elements. Consequently, multiple 5'UTR libraries have been created, and machine learning models have been employed to analyze and predict translation efficiency (TE) and protein expression, providing insights into critical regulatory features. On the one hand, these screening libraries, based on TE and mean ribosome load, struggle to accurately quantify protein expression; on the other hand, a precise method for quantifying 5'UTRs necessitates a significantly costlier library. To resolve this dilemma, we constructed a library utilizing firefly luciferase as the reporter to measure accurate protein expression. In addition, we optimized the library construction method by clustering mRNA sequences to reduce redundant data and minimize the size of the dataset. This dual strategy by increasing accuracy and reducing dataset size was found to be effective in predicting the 5'UTRs from the PC3 cell line.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"307 1","pages":""},"PeriodicalIF":13.1000,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nucleic Acids Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/nar/gkaf861","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
The 5' untranslated region (5'UTR) plays a crucial regulatory role in messenger RNA (mRNA), with modified 5'UTRs extensively utilized in vaccine production, gene therapy, etc. Nevertheless, manually optimizing 5'UTRs may encounter difficulties in balancing the effects of various cis-elements. Consequently, multiple 5'UTR libraries have been created, and machine learning models have been employed to analyze and predict translation efficiency (TE) and protein expression, providing insights into critical regulatory features. On the one hand, these screening libraries, based on TE and mean ribosome load, struggle to accurately quantify protein expression; on the other hand, a precise method for quantifying 5'UTRs necessitates a significantly costlier library. To resolve this dilemma, we constructed a library utilizing firefly luciferase as the reporter to measure accurate protein expression. In addition, we optimized the library construction method by clustering mRNA sequences to reduce redundant data and minimize the size of the dataset. This dual strategy by increasing accuracy and reducing dataset size was found to be effective in predicting the 5'UTRs from the PC3 cell line.
期刊介绍:
Nucleic Acids Research (NAR) is a scientific journal that publishes research on various aspects of nucleic acids and proteins involved in nucleic acid metabolism and interactions. It covers areas such as chemistry and synthetic biology, computational biology, gene regulation, chromatin and epigenetics, genome integrity, repair and replication, genomics, molecular biology, nucleic acid enzymes, RNA, and structural biology. The journal also includes a Survey and Summary section for brief reviews. Additionally, each year, the first issue is dedicated to biological databases, and an issue in July focuses on web-based software resources for the biological community. Nucleic Acids Research is indexed by several services including Abstracts on Hygiene and Communicable Diseases, Animal Breeding Abstracts, Agricultural Engineering Abstracts, Agbiotech News and Information, BIOSIS Previews, CAB Abstracts, and EMBASE.