Identification of cell-type-specific, transcriptionally active transposable elements using long-read RNA-sequencing data-based comprehensive annotation.

Chaemin Lim, Hyunsu An, Jihwan Park
{"title":"Identification of cell-type-specific, transcriptionally active transposable elements using long-read RNA-sequencing data-based comprehensive annotation.","authors":"Chaemin Lim, Hyunsu An, Jihwan Park","doi":"10.1186/s44342-025-00048-1","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The biological functions of transposable element (TE)-derived transcripts during physiological development, disease development, and progression have been previously reported. However, research on locus-specific TE-derived transcript expression in various human cell types remains limited.</p><p><strong>Methods: </strong>We processed 2596 publicly available human long-read RNA-sequencing (LR RNA-seq) datasets covering 21 organs and 71 cell lines in both healthy individuals and diseased patients with various conditions to compile this TE-derived transcript annotation. We established a pipeline for assembling transcripts containing TE sequences to measure transcriptionally active TE-derived transcripts in diverse tissues and cell types. Next, we applied our TE annotation to the Genotype-Tissue Expression (GTEx) single-cell RNA-sequencing (scRNA-seq) data from eight tissues.</p><p><strong>Results: </strong>We constructed the first transcriptom6e-based TE annotation using massive amounts of human LR RNA-seq data for use as a comprehensive reference to detect locus-specific TE-derived transcripts. Our annotation showed better detection accuracy for TE-derived transcripts than the RepeatMasker and GENCODE nonTE gene annotations. This annotation enabled the identification of novel TE-derived transcripts and their isoforms. We also identified alternative transcription end sites for long noncoding genes and confirmed previously annotated TE-nonTE gene fusion transcripts. Next, we applied our TE-derived transcript annotation to public scRNA-seq data from various human tissues and identified several cell-type-specific TE-derived transcripts in a locus-specific manner.</p><p><strong>Conclusions: </strong>We generated a comprehensive, TE-derived transcript annotation using large-scale, LR RNA-seq data. Researchers can use our TE reference annotation to analyze active TE transcripts and their splicing isoforms in specific transcriptome datasets and to detect de novo TE transcripts. The discovery of cell-type-specific TE-derived transcripts may help explain mechanisms underlying the maintenance of cellular identity and provide new insights into the pathological mechanisms of various diseases.</p>","PeriodicalId":94288,"journal":{"name":"Genomics & informatics","volume":"23 1","pages":"17"},"PeriodicalIF":0.0000,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12326599/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics & informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s44342-025-00048-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: The biological functions of transposable element (TE)-derived transcripts during physiological development, disease development, and progression have been previously reported. However, research on locus-specific TE-derived transcript expression in various human cell types remains limited.

Methods: We processed 2596 publicly available human long-read RNA-sequencing (LR RNA-seq) datasets covering 21 organs and 71 cell lines in both healthy individuals and diseased patients with various conditions to compile this TE-derived transcript annotation. We established a pipeline for assembling transcripts containing TE sequences to measure transcriptionally active TE-derived transcripts in diverse tissues and cell types. Next, we applied our TE annotation to the Genotype-Tissue Expression (GTEx) single-cell RNA-sequencing (scRNA-seq) data from eight tissues.

Results: We constructed the first transcriptom6e-based TE annotation using massive amounts of human LR RNA-seq data for use as a comprehensive reference to detect locus-specific TE-derived transcripts. Our annotation showed better detection accuracy for TE-derived transcripts than the RepeatMasker and GENCODE nonTE gene annotations. This annotation enabled the identification of novel TE-derived transcripts and their isoforms. We also identified alternative transcription end sites for long noncoding genes and confirmed previously annotated TE-nonTE gene fusion transcripts. Next, we applied our TE-derived transcript annotation to public scRNA-seq data from various human tissues and identified several cell-type-specific TE-derived transcripts in a locus-specific manner.

Conclusions: We generated a comprehensive, TE-derived transcript annotation using large-scale, LR RNA-seq data. Researchers can use our TE reference annotation to analyze active TE transcripts and their splicing isoforms in specific transcriptome datasets and to detect de novo TE transcripts. The discovery of cell-type-specific TE-derived transcripts may help explain mechanisms underlying the maintenance of cellular identity and provide new insights into the pathological mechanisms of various diseases.

使用基于长读rna测序数据的综合注释鉴定细胞类型特异性,转录活性转座元件。
背景:转座因子(TE)衍生转录物在生理发育、疾病发生和进展过程中的生物学功能此前已有报道。然而,对不同人类细胞类型中基因座特异性te衍生转录物表达的研究仍然有限。方法:我们处理了2596个公开的人类长读rna测序(LR RNA-seq)数据集,涵盖了健康个体和各种疾病患者的21个器官和71个细胞系,以编制这个te衍生的转录本注释。我们建立了一个管道来组装包含TE序列的转录本,以测量不同组织和细胞类型中转录活性TE衍生转录本。接下来,我们将TE注释应用于来自8个组织的基因型-组织表达(GTEx)单细胞rna测序(scRNA-seq)数据。结果:我们利用大量的人类LR RNA-seq数据构建了第一个基于转录组6的TE注释,作为检测位点特异性TE衍生转录物的综合参考。与RepeatMasker和GENCODE非te基因注释相比,我们的注释对te衍生转录本的检测精度更高。该注释能够鉴定新的te衍生转录本及其同工型。我们还鉴定了长非编码基因的替代转录端位点,并证实了先前注释的TE-nonTE基因融合转录物。接下来,我们将te衍生转录本注释应用于来自各种人体组织的公开scRNA-seq数据,并以位点特异性的方式鉴定了几种细胞类型特异性te衍生转录本。结论:我们使用大规模的LR RNA-seq数据生成了一个全面的te衍生转录本注释。研究人员可以使用我们的TE参考注释来分析特定转录组数据集中活跃的TE转录本及其剪接异构体,并检测新的TE转录本。细胞类型特异性te衍生转录本的发现可能有助于解释细胞身份维持的机制,并为各种疾病的病理机制提供新的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信