Reproducible evaluation of transposable element detectors with McClintock 2 guides accurate inference of Ty insertion patterns in yeast.

IF 4.7 2区 生物学 Q1 GENETICS & HEREDITY
Jingxuan Chen, Preston J Basting, Shunhua Han, David J Garfinkel, Casey M Bergman
{"title":"Reproducible evaluation of transposable element detectors with McClintock 2 guides accurate inference of Ty insertion patterns in yeast.","authors":"Jingxuan Chen,&nbsp;Preston J Basting,&nbsp;Shunhua Han,&nbsp;David J Garfinkel,&nbsp;Casey M Bergman","doi":"10.1186/s13100-023-00296-4","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Many computational methods have been developed to detect non-reference transposable element (TE) insertions using short-read whole genome sequencing data. The diversity and complexity of such methods often present challenges to new users seeking to reproducibly install, execute, or evaluate multiple TE insertion detectors.</p><p><strong>Results: </strong>We previously developed the McClintock meta-pipeline to facilitate the installation, execution, and evaluation of six first-generation short-read TE detectors. Here, we report a completely re-implemented version of McClintock written in Python using Snakemake and Conda that improves its installation, error handling, speed, stability, and extensibility. McClintock 2 now includes 12 short-read TE detectors, auxiliary pre-processing and analysis modules, interactive HTML reports, and a simulation framework to reproducibly evaluate the accuracy of component TE detectors. When applied to the model microbial eukaryote Saccharomyces cerevisiae, we find substantial variation in the ability of McClintock 2 components to identify the precise locations of non-reference TE insertions, with RelocaTE2 showing the highest recall and precision in simulated data. We find that RelocaTE2, TEMP, TEMP2 and TEBreak provide consistent estimates of [Formula: see text]50 non-reference TE insertions per strain and that Ty2 has the highest number of non-reference TE insertions in a species-wide panel of [Formula: see text]1000 yeast genomes. Finally, we show that best-in-class predictors for yeast applied to resequencing data have sufficient resolution to reveal a dyad pattern of integration in nucleosome-bound regions upstream of yeast tRNA genes for Ty1, Ty2, and Ty4, allowing us to extend knowledge about fine-scale target preferences revealed previously for experimentally-induced Ty1 insertions to spontaneous insertions for other copia-superfamily retrotransposons in yeast.</p><p><strong>Conclusion: </strong>McClintock ( https://github.com/bergmanlab/mcclintock/ ) provides a user-friendly pipeline for the identification of TEs in short-read WGS data using multiple TE detectors, which should benefit researchers studying TE insertion variation in a wide range of different organisms. Application of the improved McClintock system to simulated and empirical yeast genome data reveals best-in-class methods and novel biological insights for one of the most widely-studied model eukaryotes and provides a paradigm for evaluating and selecting non-reference TE detectors in other species.</p>","PeriodicalId":18854,"journal":{"name":"Mobile DNA","volume":"14 1","pages":"8"},"PeriodicalIF":4.7000,"publicationDate":"2023-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10347736/pdf/","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mobile DNA","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13100-023-00296-4","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 1

Abstract

Background: Many computational methods have been developed to detect non-reference transposable element (TE) insertions using short-read whole genome sequencing data. The diversity and complexity of such methods often present challenges to new users seeking to reproducibly install, execute, or evaluate multiple TE insertion detectors.

Results: We previously developed the McClintock meta-pipeline to facilitate the installation, execution, and evaluation of six first-generation short-read TE detectors. Here, we report a completely re-implemented version of McClintock written in Python using Snakemake and Conda that improves its installation, error handling, speed, stability, and extensibility. McClintock 2 now includes 12 short-read TE detectors, auxiliary pre-processing and analysis modules, interactive HTML reports, and a simulation framework to reproducibly evaluate the accuracy of component TE detectors. When applied to the model microbial eukaryote Saccharomyces cerevisiae, we find substantial variation in the ability of McClintock 2 components to identify the precise locations of non-reference TE insertions, with RelocaTE2 showing the highest recall and precision in simulated data. We find that RelocaTE2, TEMP, TEMP2 and TEBreak provide consistent estimates of [Formula: see text]50 non-reference TE insertions per strain and that Ty2 has the highest number of non-reference TE insertions in a species-wide panel of [Formula: see text]1000 yeast genomes. Finally, we show that best-in-class predictors for yeast applied to resequencing data have sufficient resolution to reveal a dyad pattern of integration in nucleosome-bound regions upstream of yeast tRNA genes for Ty1, Ty2, and Ty4, allowing us to extend knowledge about fine-scale target preferences revealed previously for experimentally-induced Ty1 insertions to spontaneous insertions for other copia-superfamily retrotransposons in yeast.

Conclusion: McClintock ( https://github.com/bergmanlab/mcclintock/ ) provides a user-friendly pipeline for the identification of TEs in short-read WGS data using multiple TE detectors, which should benefit researchers studying TE insertion variation in a wide range of different organisms. Application of the improved McClintock system to simulated and empirical yeast genome data reveals best-in-class methods and novel biological insights for one of the most widely-studied model eukaryotes and provides a paradigm for evaluating and selecting non-reference TE detectors in other species.

Abstract Image

Abstract Image

Abstract Image

用McClintock 2对转座元件检测器的可重复性评价指导了酵母中Ty插入模式的准确推断。
背景:许多计算方法已经开发出来检测非参考转座元件(TE)插入使用短读全基因组测序数据。这种方法的多样性和复杂性通常给寻求可重复安装、执行或评估多个TE插入检测器的新用户带来挑战。结果:我们之前开发了McClintock元管道,以促进6个第一代短读TE检测器的安装、执行和评估。在这里,我们报告了一个使用Snakemake和Conda用Python编写的完全重新实现的McClintock版本,该版本改进了其安装、错误处理、速度、稳定性和可扩展性。McClintock 2现在包括12个短读TE检测器,辅助预处理和分析模块,交互式HTML报告和模拟框架,可重复评估组件TE检测器的准确性。当应用于模型真核微生物酿酒酵母时,我们发现McClintock 2组分识别非参考TE插入的精确位置的能力存在很大差异,其中RelocaTE2在模拟数据中显示出最高的召回率和精度。我们发现,RelocaTE2、TEMP、TEMP2和TEBreak提供了每个菌株50个非参考TE插入的一致估计,并且在1000个酵母基因组的物种范围内,Ty2的非参考TE插入数量最多。最后,我们发现,应用于酵母重测序数据的最佳预测因子具有足够的分辨率,可以揭示酵母tRNA基因上游核小体结合区域中Ty1、Ty2和Ty4的二联体整合模式,从而使我们能够将之前揭示的实验诱导的Ty1插入的精细靶标偏好扩展到酵母中其他复制超家族逆转录转座子的自发插入。结论:McClintock (https://github.com/bergmanlab/mcclintock/)提供了一个用户友好的管道,用于使用多个TE检测器识别短读WGS数据中的TE,这将有利于研究不同生物中TE插入变化的研究人员。将改进的McClintock系统应用于模拟和经验酵母基因组数据,为研究最广泛的模型真核生物之一提供了一流的方法和新的生物学见解,并为评估和选择其他物种的非参考TE检测器提供了范例。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Mobile DNA
Mobile DNA GENETICS & HEREDITY-
CiteScore
8.20
自引率
6.10%
发文量
26
审稿时长
11 weeks
期刊介绍: Mobile DNA is an online, peer-reviewed, open access journal that publishes articles providing novel insights into DNA rearrangements in all organisms, ranging from transposition and other types of recombination mechanisms to patterns and processes of mobile element and host genome evolution. In addition, the journal will consider articles on the utility of mobile genetic elements in biotechnological methods and protocols.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信