{"title":"使用基于长读rna测序数据的综合注释鉴定细胞类型特异性,转录活性转座元件。","authors":"Chaemin Lim, Hyunsu An, Jihwan Park","doi":"10.1186/s44342-025-00048-1","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The biological functions of transposable element (TE)-derived transcripts during physiological development, disease development, and progression have been previously reported. However, research on locus-specific TE-derived transcript expression in various human cell types remains limited.</p><p><strong>Methods: </strong>We processed 2596 publicly available human long-read RNA-sequencing (LR RNA-seq) datasets covering 21 organs and 71 cell lines in both healthy individuals and diseased patients with various conditions to compile this TE-derived transcript annotation. We established a pipeline for assembling transcripts containing TE sequences to measure transcriptionally active TE-derived transcripts in diverse tissues and cell types. Next, we applied our TE annotation to the Genotype-Tissue Expression (GTEx) single-cell RNA-sequencing (scRNA-seq) data from eight tissues.</p><p><strong>Results: </strong>We constructed the first transcriptom6e-based TE annotation using massive amounts of human LR RNA-seq data for use as a comprehensive reference to detect locus-specific TE-derived transcripts. Our annotation showed better detection accuracy for TE-derived transcripts than the RepeatMasker and GENCODE nonTE gene annotations. This annotation enabled the identification of novel TE-derived transcripts and their isoforms. We also identified alternative transcription end sites for long noncoding genes and confirmed previously annotated TE-nonTE gene fusion transcripts. Next, we applied our TE-derived transcript annotation to public scRNA-seq data from various human tissues and identified several cell-type-specific TE-derived transcripts in a locus-specific manner.</p><p><strong>Conclusions: </strong>We generated a comprehensive, TE-derived transcript annotation using large-scale, LR RNA-seq data. Researchers can use our TE reference annotation to analyze active TE transcripts and their splicing isoforms in specific transcriptome datasets and to detect de novo TE transcripts. The discovery of cell-type-specific TE-derived transcripts may help explain mechanisms underlying the maintenance of cellular identity and provide new insights into the pathological mechanisms of various diseases.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"23 1","pages":"17"},"PeriodicalIF":0.0000,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12326599/pdf/","citationCount":"0","resultStr":"{\"title\":\"Identification of cell-type-specific, transcriptionally active transposable elements using long-read RNA-sequencing data-based comprehensive annotation.\",\"authors\":\"Chaemin Lim, Hyunsu An, Jihwan Park\",\"doi\":\"10.1186/s44342-025-00048-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The biological functions of transposable element (TE)-derived transcripts during physiological development, disease development, and progression have been previously reported. However, research on locus-specific TE-derived transcript expression in various human cell types remains limited.</p><p><strong>Methods: </strong>We processed 2596 publicly available human long-read RNA-sequencing (LR RNA-seq) datasets covering 21 organs and 71 cell lines in both healthy individuals and diseased patients with various conditions to compile this TE-derived transcript annotation. We established a pipeline for assembling transcripts containing TE sequences to measure transcriptionally active TE-derived transcripts in diverse tissues and cell types. Next, we applied our TE annotation to the Genotype-Tissue Expression (GTEx) single-cell RNA-sequencing (scRNA-seq) data from eight tissues.</p><p><strong>Results: </strong>We constructed the first transcriptom6e-based TE annotation using massive amounts of human LR RNA-seq data for use as a comprehensive reference to detect locus-specific TE-derived transcripts. Our annotation showed better detection accuracy for TE-derived transcripts than the RepeatMasker and GENCODE nonTE gene annotations. This annotation enabled the identification of novel TE-derived transcripts and their isoforms. We also identified alternative transcription end sites for long noncoding genes and confirmed previously annotated TE-nonTE gene fusion transcripts. Next, we applied our TE-derived transcript annotation to public scRNA-seq data from various human tissues and identified several cell-type-specific TE-derived transcripts in a locus-specific manner.</p><p><strong>Conclusions: </strong>We generated a comprehensive, TE-derived transcript annotation using large-scale, LR RNA-seq data. Researchers can use our TE reference annotation to analyze active TE transcripts and their splicing isoforms in specific transcriptome datasets and to detect de novo TE transcripts. The discovery of cell-type-specific TE-derived transcripts may help explain mechanisms underlying the maintenance of cellular identity and provide new insights into the pathological mechanisms of various diseases.</p>\",\"PeriodicalId\":94288,\"journal\":{\"name\":\"Genomics & informatics\",\"volume\":\"23 1\",\"pages\":\"17\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-08-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12326599/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genomics & informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s44342-025-00048-1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics & informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s44342-025-00048-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Identification of cell-type-specific, transcriptionally active transposable elements using long-read RNA-sequencing data-based comprehensive annotation.
Background: The biological functions of transposable element (TE)-derived transcripts during physiological development, disease development, and progression have been previously reported. However, research on locus-specific TE-derived transcript expression in various human cell types remains limited.
Methods: We processed 2596 publicly available human long-read RNA-sequencing (LR RNA-seq) datasets covering 21 organs and 71 cell lines in both healthy individuals and diseased patients with various conditions to compile this TE-derived transcript annotation. We established a pipeline for assembling transcripts containing TE sequences to measure transcriptionally active TE-derived transcripts in diverse tissues and cell types. Next, we applied our TE annotation to the Genotype-Tissue Expression (GTEx) single-cell RNA-sequencing (scRNA-seq) data from eight tissues.
Results: We constructed the first transcriptom6e-based TE annotation using massive amounts of human LR RNA-seq data for use as a comprehensive reference to detect locus-specific TE-derived transcripts. Our annotation showed better detection accuracy for TE-derived transcripts than the RepeatMasker and GENCODE nonTE gene annotations. This annotation enabled the identification of novel TE-derived transcripts and their isoforms. We also identified alternative transcription end sites for long noncoding genes and confirmed previously annotated TE-nonTE gene fusion transcripts. Next, we applied our TE-derived transcript annotation to public scRNA-seq data from various human tissues and identified several cell-type-specific TE-derived transcripts in a locus-specific manner.
Conclusions: We generated a comprehensive, TE-derived transcript annotation using large-scale, LR RNA-seq data. Researchers can use our TE reference annotation to analyze active TE transcripts and their splicing isoforms in specific transcriptome datasets and to detect de novo TE transcripts. The discovery of cell-type-specific TE-derived transcripts may help explain mechanisms underlying the maintenance of cellular identity and provide new insights into the pathological mechanisms of various diseases.