Detecting transposable elements in long read genomes using sTELLeR.

Kristine Bilgrav Saether, Jesper Eisfeldt
{"title":"Detecting transposable elements in long read genomes using sTELLeR.","authors":"Kristine Bilgrav Saether, Jesper Eisfeldt","doi":"10.1093/bioinformatics/btae686","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Repeat elements such as transposable elements (TE), are highly repetitive DNA sequences that compose around 50% of the genome. TEs such as Alu, SVA, HERV and L1 elements can cause disease through disrupting genes, causing frameshift mutations or altering splicing patters. These are elements challenging to characterize using short-read genome sequencing (srGS), due to its read length and TEs repetitive nature. Long read genome sequencing (lrGS) enables bridging of TEs, allowing increased resolution across repetitive DNA sequences. lrGS therefore present an opportunity for improved TE detection and analysis, not only from a research perspective, but also for future clinical detection. When choosing a lrGS TE caller, parameters such as runtime, CPU hours, sensitivity, precision and compatibility with inclusion into pipelines are crucial for efficient detection.</p><p><strong>Results: </strong>We therefore developed sTELLeR, (s) Transposable ELement in Long (e) Read, for accurate, fast and effective TE detection. Particularly, sTELLeR exhibit higher precision and sensitivity for calling of Alu elements than similar tools. The caller is 5-48x as fast and uses <2% of the CPU hours compared to competitive callers. The caller is haplotype aware and output results in a VCF file, enabling compatibility with other variant callers and downstream analysis.</p><p><strong>Availability: </strong>sTELLeR is a python-based tool and is available at https://github.com/kristinebilgrav/sTELLeR. Altogether, we show that sTELLeR is a fast, sensitive and precise caller for detection of TE elements, and can easily be implemented into variant calling workflows.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btae686","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: Repeat elements such as transposable elements (TE), are highly repetitive DNA sequences that compose around 50% of the genome. TEs such as Alu, SVA, HERV and L1 elements can cause disease through disrupting genes, causing frameshift mutations or altering splicing patters. These are elements challenging to characterize using short-read genome sequencing (srGS), due to its read length and TEs repetitive nature. Long read genome sequencing (lrGS) enables bridging of TEs, allowing increased resolution across repetitive DNA sequences. lrGS therefore present an opportunity for improved TE detection and analysis, not only from a research perspective, but also for future clinical detection. When choosing a lrGS TE caller, parameters such as runtime, CPU hours, sensitivity, precision and compatibility with inclusion into pipelines are crucial for efficient detection.

Results: We therefore developed sTELLeR, (s) Transposable ELement in Long (e) Read, for accurate, fast and effective TE detection. Particularly, sTELLeR exhibit higher precision and sensitivity for calling of Alu elements than similar tools. The caller is 5-48x as fast and uses <2% of the CPU hours compared to competitive callers. The caller is haplotype aware and output results in a VCF file, enabling compatibility with other variant callers and downstream analysis.

Availability: sTELLeR is a python-based tool and is available at https://github.com/kristinebilgrav/sTELLeR. Altogether, we show that sTELLeR is a fast, sensitive and precise caller for detection of TE elements, and can easily be implemented into variant calling workflows.

Supplementary information: Supplementary data are available at Bioinformatics online.

利用 sTELLeR 检测长读数基因组中的转座元件。
动机可转座元件(TE)等重复元件是高度重复的 DNA 序列,约占基因组的 50%。Alu、SVA、HERV 和 L1 等可转座元件可通过破坏基因、导致换框突变或改变剪接模式而致病。由于短读数基因组测序(srGS)的读数长度和TEs的重复性,这些元素的特征描述具有挑战性。因此,长读数基因组测序(lrGS)为改进 TE 检测和分析提供了机会,不仅从研究角度来看是如此,在未来的临床检测中也是如此。在选择 lrGS TE 调用器时,运行时间、CPU 小时数、灵敏度、精确度以及与纳入管道的兼容性等参数对于高效检测至关重要:因此,我们开发了 sTELLeR(s) Transposable ELement in Long (e) Read,用于准确、快速、有效地检测 TE。特别是,与同类工具相比,sTELLeR 在调用 Alu 元素方面表现出更高的精度和灵敏度。调用速度是同类工具的5-48倍,可用性:sTELLeR是一个基于python的工具,可在https://github.com/kristinebilgrav/sTELLeR。总之,我们证明了 sTELLeR 是一种快速、灵敏、精确的 TE 元素检测调用工具,可以很容易地应用到变异调用工作流中:补充数据可在 Bioinformatics online 上获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信