Assexon: Assembling Exon Using Gene Capture Data

Hao Yuan, Calder J Atta, L. Tornabene, Chenhong Li
{"title":"Assexon: Assembling Exon Using Gene Capture Data","authors":"Hao Yuan, Calder J Atta, L. Tornabene, Chenhong Li","doi":"10.1177/1176934319874792","DOIUrl":null,"url":null,"abstract":"Exon capture across species has been one of the most broadly applied approaches to acquire multi-locus data in phylogenomic studies of non-model organisms. Methods for assembling loci from short-read sequences (eg, Illumina platforms) that rely on mapping reads to a reference genome may not be suitable for studies comprising species across a wide phylogenetic spectrum; thus, de novo assembling methods are more generally applied. Current approaches for assembling targeted exons from short reads are not particularly optimized as they cannot (1) assemble loci with low read depth, (2) handle large files efficiently, and (3) reliably address issues with paralogs. Thus, we present Assexon: a streamlined pipeline that de novo assembles targeted exons and their flanking sequences from raw reads. We tested our method using reads from Lepisosteus osseus (4.37 Gb) and Boleophthalmus pectinirostris (2.43 Gb), which are captured using baits that were designed based on genome sequence of Lepisosteus oculatus and Oreochromis niloticus, respectively. We compared performance of Assexon to PHYLUCE and HybPiper, which are commonly used pipelines to assemble ultra-conserved element (UCE) and Hyb-seq data. A custom exon capture analysis pipeline (CP) developed by Yuan et al was compared as well. Assexon accurately assembled more than 3400 to 3800 (20%-28%) loci than PHYLUCE and more than 1900 to 2300 (8%-14%) loci than HybPiper across different levels of phylogenetic divergence. Assexon ran at least twice as fast as PHYLUCE and HybPiper. Number of loci assembled using CP was comparable with Assexon in both tests, while Assexon ran at least 7 times faster than CP. In addition, some steps of CP require the user’s interaction and are not fully automated, and this user time was not counted in our calculation. Both Assexon and CP retrieved no paralogs in the testing runs, but PHYLUCE and Hybpiper did. In conclusion, Assexon is a tool for accurate and efficient assembling of large read sets from exon capture experiments. Furthermore, Assexon includes scripts to filter poorly aligned coding regions and flanking regions, calculate summary statistics of loci, and select loci with reliable phylogenetic signal. Assexon is available at https://github.com/yhadevol/Assexon.","PeriodicalId":136690,"journal":{"name":"Evolutionary Bioinformatics Online","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Evolutionary Bioinformatics Online","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/1176934319874792","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

Exon capture across species has been one of the most broadly applied approaches to acquire multi-locus data in phylogenomic studies of non-model organisms. Methods for assembling loci from short-read sequences (eg, Illumina platforms) that rely on mapping reads to a reference genome may not be suitable for studies comprising species across a wide phylogenetic spectrum; thus, de novo assembling methods are more generally applied. Current approaches for assembling targeted exons from short reads are not particularly optimized as they cannot (1) assemble loci with low read depth, (2) handle large files efficiently, and (3) reliably address issues with paralogs. Thus, we present Assexon: a streamlined pipeline that de novo assembles targeted exons and their flanking sequences from raw reads. We tested our method using reads from Lepisosteus osseus (4.37 Gb) and Boleophthalmus pectinirostris (2.43 Gb), which are captured using baits that were designed based on genome sequence of Lepisosteus oculatus and Oreochromis niloticus, respectively. We compared performance of Assexon to PHYLUCE and HybPiper, which are commonly used pipelines to assemble ultra-conserved element (UCE) and Hyb-seq data. A custom exon capture analysis pipeline (CP) developed by Yuan et al was compared as well. Assexon accurately assembled more than 3400 to 3800 (20%-28%) loci than PHYLUCE and more than 1900 to 2300 (8%-14%) loci than HybPiper across different levels of phylogenetic divergence. Assexon ran at least twice as fast as PHYLUCE and HybPiper. Number of loci assembled using CP was comparable with Assexon in both tests, while Assexon ran at least 7 times faster than CP. In addition, some steps of CP require the user’s interaction and are not fully automated, and this user time was not counted in our calculation. Both Assexon and CP retrieved no paralogs in the testing runs, but PHYLUCE and Hybpiper did. In conclusion, Assexon is a tool for accurate and efficient assembling of large read sets from exon capture experiments. Furthermore, Assexon includes scripts to filter poorly aligned coding regions and flanking regions, calculate summary statistics of loci, and select loci with reliable phylogenetic signal. Assexon is available at https://github.com/yhadevol/Assexon.
Assexon:使用基因捕获数据组装外显子
跨物种外显子捕获已成为非模式生物系统基因组研究中获得多基因座数据的最广泛应用的方法之一。从短读序列(例如,Illumina平台)中组装位点的方法依赖于将reads定位到参考基因组,可能不适合包括跨广泛系统发育谱的物种的研究;因此,重新组装方法被更普遍地应用。目前用于从短读取中组装目标外显子的方法并没有特别优化,因为它们不能(1)组装低读取深度的位点,(2)有效处理大文件,以及(3)可靠地解决类似的问题。因此,我们提出Assexon:一个流线型的管道,从头组装目标外显子及其侧面序列从原始读取。本研究以骨Lepisosteus osseus (4.37 Gb)和pectinrostris Boleophthalmus pectinrostris (2.43 Gb)的reads为样本,分别使用基于眼窝Lepisosteus oculatus和尼罗ticus Oreochromis niloticus基因组序列设计的诱饵捕获。我们将Assexon的性能与PHYLUCE和HybPiper进行了比较,这两种管道通常用于组装超保守元素(UCE)和Hyb-seq数据。还比较了Yuan等人开发的自定义外显子捕获分析管道(CP)。Assexon在不同的系统发育差异水平上比PHYLUCE准确地组装了3400至3800多个(20%-28%)位点,比HybPiper准确地组装了1900至2300多个(8%-14%)位点。Assexon的运行速度至少是PHYLUCE和HybPiper的两倍。在两个测试中,使用CP组装的基因座数量与Assexon相当,而Assexon的运行速度比CP至少快7倍。此外,CP的一些步骤需要用户的交互,并且不是完全自动化的,因此这些用户时间没有计算在我们的计算中。Assexon和CP在测试中都没有获得类似物,但PHYLUCE和Hybpiper却获得了类似物。总之,Assexon是一种工具,用于准确和高效地组装来自外显子捕获实验的大读集。此外,Assexon还包含脚本来过滤排列不良的编码区域和侧翼区域,计算基因座的汇总统计,并选择具有可靠系统发育信号的基因座。Assexon的网址是https://github.com/yhadevol/Assexon。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信