{"title":"Maximising informativeness for target capture-based phylogenomics in <i>Erica</i> (Ericaceae).","authors":"Seth D Musker, Nicolai M Nürk, Michael D Pirie","doi":"10.3897/phytokeys.251.136373","DOIUrl":null,"url":null,"abstract":"<p><p>Plant phylogenetics has been revolutionised in the genomic era, with target capture acting as the primary workhorse of most recent research in the new field of phylogenomics. Target capture (aka Hyb-Seq) allows researchers to sequence hundreds of genomic regions (loci) of their choosing, at relatively low cost per sample, from which to derive phylogenetically informative data. Although this highly flexible and widely applicable method has rightly earned its place as the field's <i>de facto</i> standard, it does not come without its challenges. In particular, users have to specify which loci to sequence-a surprisingly difficult task, especially when working with non-model groups, as it requires pre-existing genomic resources in the form of assembled genomes and/or transcriptomes. In the absence of taxon-specific genomic resources, target sets exist that are designed to work across broad taxonomic scales. However, the highly conserved loci that they target may lack informativeness for difficult phylogenetic problems, such as that presented by the rapid radiation of <i>Erica</i> in southern Africa. We designed a target set for <i>Erica</i> phylogenomics intended to maximise informativeness and minimise paralogy while maintaining universality by including genes from the widely used Angiosperms353 set. Comprising just over 300 genes, the targets had excellent recovery rates in roughly 90 <i>Erica</i> species as well as outgroups from <i>Calluna</i>, <i>Daboecia</i>, and <i>Rhododendron</i>, and had high information content as measured by parsimony informative sites and Quartet Internode Resolution Probability (QIRP) at shallow nodes. Notably, QIRP was positively correlated with intron content, while including introns in targets-rather than recovering them via exon-flanking \"bycatch\"-substantially improved intron recovery. Overall, our results show the value of building a custom target set, and we provide a suite of open-source tools that can be used to replicate our approach in other groups (https://github.com/SethMusker/TargetVet).</p>","PeriodicalId":20070,"journal":{"name":"PhytoKeys","volume":"251 ","pages":"87-118"},"PeriodicalIF":1.3000,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11758362/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PhytoKeys","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3897/phytokeys.251.136373","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"PLANT SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Plant phylogenetics has been revolutionised in the genomic era, with target capture acting as the primary workhorse of most recent research in the new field of phylogenomics. Target capture (aka Hyb-Seq) allows researchers to sequence hundreds of genomic regions (loci) of their choosing, at relatively low cost per sample, from which to derive phylogenetically informative data. Although this highly flexible and widely applicable method has rightly earned its place as the field's de facto standard, it does not come without its challenges. In particular, users have to specify which loci to sequence-a surprisingly difficult task, especially when working with non-model groups, as it requires pre-existing genomic resources in the form of assembled genomes and/or transcriptomes. In the absence of taxon-specific genomic resources, target sets exist that are designed to work across broad taxonomic scales. However, the highly conserved loci that they target may lack informativeness for difficult phylogenetic problems, such as that presented by the rapid radiation of Erica in southern Africa. We designed a target set for Erica phylogenomics intended to maximise informativeness and minimise paralogy while maintaining universality by including genes from the widely used Angiosperms353 set. Comprising just over 300 genes, the targets had excellent recovery rates in roughly 90 Erica species as well as outgroups from Calluna, Daboecia, and Rhododendron, and had high information content as measured by parsimony informative sites and Quartet Internode Resolution Probability (QIRP) at shallow nodes. Notably, QIRP was positively correlated with intron content, while including introns in targets-rather than recovering them via exon-flanking "bycatch"-substantially improved intron recovery. Overall, our results show the value of building a custom target set, and we provide a suite of open-source tools that can be used to replicate our approach in other groups (https://github.com/SethMusker/TargetVet).
期刊介绍:
PhytoKeys is a peer-reviewed, open-access, online and print, rapidly produced journal launched to support free exchange of ideas and information in systematic botany.
All papers published in PhytoKeys can be freely copied, downloaded, printed and distributed at no charge for the reader. Authors are thus encouraged to post the pdf files of published papers on their homepages or elsewhere to expedite distribution. There is no charge for color.