Sonia Chothani, Lena Ho, Sebastian Schafer, Owen Rackham
{"title":"Discovering microproteins: making the most of ribosome profiling data.","authors":"Sonia Chothani, Lena Ho, Sebastian Schafer, Owen Rackham","doi":"10.1080/15476286.2023.2279845","DOIUrl":null,"url":null,"abstract":"<p><p>Building a reference set of protein-coding open reading frames (ORFs) has revolutionized biological process discovery and understanding. Traditionally, gene models have been confirmed using cDNA sequencing and encoded translated regions inferred using sequence-based detection of start and stop combinations longer than 100 amino-acids to prevent false positives. This has led to small ORFs (smORFs) and their encoded proteins left un-annotated. Ribo-seq allows deciphering translated regions from untranslated irrespective of the length. In this review, we describe the power of Ribo-seq data in detection of smORFs while discussing the major challenge posed by data-quality, -depth and -sparseness in identifying the start and end of smORF translation. In particular, we outline smORF cataloguing efforts in humans and the large differences that have arisen due to variation in data, methods and assumptions. Although current versions of smORF reference sets can already be used as a powerful tool for hypothesis generation, we recommend that future editions should consider these data limitations and adopt unified processing for the community to establish a canonical catalogue of translated smORFs.</p>","PeriodicalId":21351,"journal":{"name":"RNA Biology","volume":null,"pages":null},"PeriodicalIF":3.6000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10730196/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"RNA Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1080/15476286.2023.2279845","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/11/27 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Building a reference set of protein-coding open reading frames (ORFs) has revolutionized biological process discovery and understanding. Traditionally, gene models have been confirmed using cDNA sequencing and encoded translated regions inferred using sequence-based detection of start and stop combinations longer than 100 amino-acids to prevent false positives. This has led to small ORFs (smORFs) and their encoded proteins left un-annotated. Ribo-seq allows deciphering translated regions from untranslated irrespective of the length. In this review, we describe the power of Ribo-seq data in detection of smORFs while discussing the major challenge posed by data-quality, -depth and -sparseness in identifying the start and end of smORF translation. In particular, we outline smORF cataloguing efforts in humans and the large differences that have arisen due to variation in data, methods and assumptions. Although current versions of smORF reference sets can already be used as a powerful tool for hypothesis generation, we recommend that future editions should consider these data limitations and adopt unified processing for the community to establish a canonical catalogue of translated smORFs.
期刊介绍:
RNA has played a central role in all cellular processes since the beginning of life: decoding the genome, regulating gene expression, mediating molecular interactions, catalyzing chemical reactions. RNA Biology, as a leading journal in the field, provides a platform for presenting and discussing cutting-edge RNA research.
RNA Biology brings together a multidisciplinary community of scientists working in the areas of:
Transcription and splicing
Post-transcriptional regulation of gene expression
Non-coding RNAs
RNA localization
Translation and catalysis by RNA
Structural biology
Bioinformatics
RNA in disease and therapy