利用 SNP 数据进行系统发生学分析的数据驱动指南

IF 2.7 3区 生物学 Q2 PLANT SCIENCES
J. Suissa, Gisel Y. De La Cerda, Leland C. Graber, Chloe Jelley, David Wickell, Heather R. Phillips, Ayress D. Grinage, C. S. Moreau, Chelsea D. Specht, Jeff J. Doyle, Jacob B. Landis
{"title":"利用 SNP 数据进行系统发生学分析的数据驱动指南","authors":"J. Suissa, Gisel Y. De La Cerda, Leland C. Graber, Chloe Jelley, David Wickell, Heather R. Phillips, Ayress D. Grinage, C. S. Moreau, Chelsea D. Specht, Jeff J. Doyle, Jacob B. Landis","doi":"10.1002/aps3.11611","DOIUrl":null,"url":null,"abstract":"There is a general lack of consensus on the best practices for filtering of single‐nucleotide polymorphisms (SNPs) and whether it is better to use SNPs or include flanking regions (full “locus”) in phylogenomic analyses and subsequent comparative methods.Using genotyping‐by‐sequencing data from 22 Glycine species, we assessed the effects of SNP vs. locus usage and SNP retention stringency. We compared branch length, node support, and divergence time estimation across 16 datasets with varying amounts of missing data and total size.Our results revealed five aspects of phylogenomic data usage that may be generally applicable: (1) tree topology is largely congruent across analyses; (2) filtering strictly for SNP retention (e.g., 90–100%) reduces support and can alter some inferred relationships; (3) absolute branch lengths vary by two orders of magnitude between SNP and locus datasets; (4) data type and branch length variation have little effect on divergence time estimation; and (5) phylograms alter the estimation of ancestral states and rates of morphological evolution.Using SNP or locus datasets does not alter phylogenetic inference significantly, unless researchers want or need to use absolute branch lengths. We recommend against using excessive filtering thresholds for SNP retention to reduce the risk of producing inconsistent topologies and generating low support.","PeriodicalId":8022,"journal":{"name":"Applications in Plant Sciences","volume":null,"pages":null},"PeriodicalIF":2.7000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data‐driven guidelines for phylogenomic analyses using SNP data\",\"authors\":\"J. Suissa, Gisel Y. De La Cerda, Leland C. Graber, Chloe Jelley, David Wickell, Heather R. Phillips, Ayress D. Grinage, C. S. Moreau, Chelsea D. Specht, Jeff J. Doyle, Jacob B. Landis\",\"doi\":\"10.1002/aps3.11611\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There is a general lack of consensus on the best practices for filtering of single‐nucleotide polymorphisms (SNPs) and whether it is better to use SNPs or include flanking regions (full “locus”) in phylogenomic analyses and subsequent comparative methods.Using genotyping‐by‐sequencing data from 22 Glycine species, we assessed the effects of SNP vs. locus usage and SNP retention stringency. We compared branch length, node support, and divergence time estimation across 16 datasets with varying amounts of missing data and total size.Our results revealed five aspects of phylogenomic data usage that may be generally applicable: (1) tree topology is largely congruent across analyses; (2) filtering strictly for SNP retention (e.g., 90–100%) reduces support and can alter some inferred relationships; (3) absolute branch lengths vary by two orders of magnitude between SNP and locus datasets; (4) data type and branch length variation have little effect on divergence time estimation; and (5) phylograms alter the estimation of ancestral states and rates of morphological evolution.Using SNP or locus datasets does not alter phylogenetic inference significantly, unless researchers want or need to use absolute branch lengths. We recommend against using excessive filtering thresholds for SNP retention to reduce the risk of producing inconsistent topologies and generating low support.\",\"PeriodicalId\":8022,\"journal\":{\"name\":\"Applications in Plant Sciences\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-08-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applications in Plant Sciences\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1002/aps3.11611\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PLANT SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applications in Plant Sciences","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/aps3.11611","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PLANT SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

关于过滤单核苷酸多态性(SNP)的最佳方法,以及在系统发生组分析和后续比较方法中使用 SNP 还是包括侧翼区域(完整的 "位点")更好,目前普遍缺乏共识。我们比较了 16 个数据集的分支长度、节点支持率和分化时间估计,这些数据集的缺失数据量和总规模各不相同。我们的结果揭示了系统发生组数据使用的五个方面,这些方面可能具有普遍适用性:(1)树拓扑结构在不同分析中基本一致;(2)严格筛选 SNP 保留率(如 90%-100%)会降低支持率,并且会影响树的分化时间、(3) SNP 数据集和基因座数据集的绝对分支长度相差两个数量级;(4) 数据类型和分支长度的变化对分歧时间的估计影响不大;(5) 系统图会改变对祖先状态和形态进化速率的估计。我们建议不要使用过高的 SNP 保留过滤阈值,以降低产生不一致拓扑和低支持度的风险。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Data‐driven guidelines for phylogenomic analyses using SNP data
There is a general lack of consensus on the best practices for filtering of single‐nucleotide polymorphisms (SNPs) and whether it is better to use SNPs or include flanking regions (full “locus”) in phylogenomic analyses and subsequent comparative methods.Using genotyping‐by‐sequencing data from 22 Glycine species, we assessed the effects of SNP vs. locus usage and SNP retention stringency. We compared branch length, node support, and divergence time estimation across 16 datasets with varying amounts of missing data and total size.Our results revealed five aspects of phylogenomic data usage that may be generally applicable: (1) tree topology is largely congruent across analyses; (2) filtering strictly for SNP retention (e.g., 90–100%) reduces support and can alter some inferred relationships; (3) absolute branch lengths vary by two orders of magnitude between SNP and locus datasets; (4) data type and branch length variation have little effect on divergence time estimation; and (5) phylograms alter the estimation of ancestral states and rates of morphological evolution.Using SNP or locus datasets does not alter phylogenetic inference significantly, unless researchers want or need to use absolute branch lengths. We recommend against using excessive filtering thresholds for SNP retention to reduce the risk of producing inconsistent topologies and generating low support.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.30
自引率
0.00%
发文量
50
审稿时长
12 weeks
期刊介绍: Applications in Plant Sciences (APPS) is a monthly, peer-reviewed, open access journal promoting the rapid dissemination of newly developed, innovative tools and protocols in all areas of the plant sciences, including genetics, structure, function, development, evolution, systematics, and ecology. Given the rapid progress today in technology and its application in the plant sciences, the goal of APPS is to foster communication within the plant science community to advance scientific research. APPS is a publication of the Botanical Society of America, originating in 2009 as the American Journal of Botany''s online-only section, AJB Primer Notes & Protocols in the Plant Sciences. APPS publishes the following types of articles: (1) Protocol Notes describe new methods and technological advancements; (2) Genomic Resources Articles characterize the development and demonstrate the usefulness of newly developed genomic resources, including transcriptomes; (3) Software Notes detail new software applications; (4) Application Articles illustrate the application of a new protocol, method, or software application within the context of a larger study; (5) Review Articles evaluate available techniques, methods, or protocols; (6) Primer Notes report novel genetic markers with evidence of wide applicability.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信