Diego Guerra-Almeida, Diogo Antonio Tschoeke, Rodrigo Nunes-da-Fonseca
{"title":"通过全面的转录特征分类了解小ORF多样性。","authors":"Diego Guerra-Almeida, Diogo Antonio Tschoeke, Rodrigo Nunes-da-Fonseca","doi":"10.1093/dnares/dsab007","DOIUrl":null,"url":null,"abstract":"<p><p>Small open reading frames (small ORFs/sORFs/smORFs) are potentially coding sequences smaller than 100 codons that have historically been considered junk DNA by gene prediction software and in annotation screening; however, the advent of next-generation sequencing has contributed to the deeper investigation of junk DNA regions and their transcription products, resulting in the emergence of smORFs as a new focus of interest in systems biology. Several smORF peptides were recently reported in non-canonical mRNAs as new players in numerous biological contexts; however, their relevance is still overlooked in coding potential analysis. Hence, this review proposes a smORF classification based on transcriptional features, discussing the most promising approaches to investigate smORFs based on their different characteristics. First, smORFs were divided into non-expressed (intergenic) and expressed (genic) smORFs. Second, genic smORFs were classified as smORFs located in non-coding RNAs (ncRNAs) or canonical mRNAs. Finally, smORFs in ncRNAs were further subdivided into sequences located in small or long RNAs, whereas smORFs located in canonical mRNAs were subdivided into several specific classes depending on their localization along the gene. We hope that this review provides new insights into large-scale annotations and reinforces the role of smORFs as essential components of a hidden coding DNA world.</p>","PeriodicalId":11212,"journal":{"name":"DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/18/a6/dsab007.PMC8435553.pdf","citationCount":"9","resultStr":"{\"title\":\"Understanding small ORF diversity through a comprehensive transcription feature classification.\",\"authors\":\"Diego Guerra-Almeida, Diogo Antonio Tschoeke, Rodrigo Nunes-da-Fonseca\",\"doi\":\"10.1093/dnares/dsab007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Small open reading frames (small ORFs/sORFs/smORFs) are potentially coding sequences smaller than 100 codons that have historically been considered junk DNA by gene prediction software and in annotation screening; however, the advent of next-generation sequencing has contributed to the deeper investigation of junk DNA regions and their transcription products, resulting in the emergence of smORFs as a new focus of interest in systems biology. Several smORF peptides were recently reported in non-canonical mRNAs as new players in numerous biological contexts; however, their relevance is still overlooked in coding potential analysis. Hence, this review proposes a smORF classification based on transcriptional features, discussing the most promising approaches to investigate smORFs based on their different characteristics. First, smORFs were divided into non-expressed (intergenic) and expressed (genic) smORFs. Second, genic smORFs were classified as smORFs located in non-coding RNAs (ncRNAs) or canonical mRNAs. Finally, smORFs in ncRNAs were further subdivided into sequences located in small or long RNAs, whereas smORFs located in canonical mRNAs were subdivided into several specific classes depending on their localization along the gene. We hope that this review provides new insights into large-scale annotations and reinforces the role of smORFs as essential components of a hidden coding DNA world.</p>\",\"PeriodicalId\":11212,\"journal\":{\"name\":\"DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/18/a6/dsab007.PMC8435553.pdf\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/dnares/dsab007\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/dnares/dsab007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
摘要
小开放阅读框(Small open reading frame, orf / sorf / smorf)是指小于100个密码子的潜在编码序列,在基因预测软件和注释筛选中历来被认为是垃圾DNA;然而,下一代测序的出现促进了对垃圾DNA区域及其转录产物的深入研究,导致smorf的出现成为系统生物学的新热点。最近报道了几种smORF肽在非规范mrna中作为许多生物学背景下的新参与者;然而,它们的相关性在编码潜能分析中仍然被忽视。因此,本文提出了一种基于转录特征的smORF分类方法,并根据smORF的不同特征讨论了最有希望研究smORF的方法。首先,将smorf分为非表达(基因间)和表达(基因)smorf。其次,基因smorf被分为位于非编码rna (ncRNAs)和规范mrna中的smorf。最后,ncRNAs中的smorf被进一步细分为位于小rna或长rna中的序列,而位于规范mrna中的smorf则根据其沿基因的定位被细分为几个特定的类别。我们希望这篇综述能够为大规模注释提供新的见解,并加强smorf作为隐藏编码DNA世界的重要组成部分的作用。
Understanding small ORF diversity through a comprehensive transcription feature classification.
Small open reading frames (small ORFs/sORFs/smORFs) are potentially coding sequences smaller than 100 codons that have historically been considered junk DNA by gene prediction software and in annotation screening; however, the advent of next-generation sequencing has contributed to the deeper investigation of junk DNA regions and their transcription products, resulting in the emergence of smORFs as a new focus of interest in systems biology. Several smORF peptides were recently reported in non-canonical mRNAs as new players in numerous biological contexts; however, their relevance is still overlooked in coding potential analysis. Hence, this review proposes a smORF classification based on transcriptional features, discussing the most promising approaches to investigate smORFs based on their different characteristics. First, smORFs were divided into non-expressed (intergenic) and expressed (genic) smORFs. Second, genic smORFs were classified as smORFs located in non-coding RNAs (ncRNAs) or canonical mRNAs. Finally, smORFs in ncRNAs were further subdivided into sequences located in small or long RNAs, whereas smORFs located in canonical mRNAs were subdivided into several specific classes depending on their localization along the gene. We hope that this review provides new insights into large-scale annotations and reinforces the role of smORFs as essential components of a hidden coding DNA world.