利用自然历史指导有监督的机器学习,利用遗传数据进行隐物种划分。

IF 2.6 2区 生物学 Q1 ZOOLOGY
Shahan Derkarabetian, James Starrett, Marshal Hedin
{"title":"利用自然历史指导有监督的机器学习,利用遗传数据进行隐物种划分。","authors":"Shahan Derkarabetian,&nbsp;James Starrett,&nbsp;Marshal Hedin","doi":"10.1186/s12983-022-00453-0","DOIUrl":null,"url":null,"abstract":"<p><p>The diversity of biological and ecological characteristics of organisms, and the underlying genetic patterns and processes of speciation, makes the development of universally applicable genetic species delimitation methods challenging. Many approaches, like those incorporating the multispecies coalescent, sometimes delimit populations and overestimate species numbers. This issue is exacerbated in taxa with inherently high population structure due to low dispersal ability, and in cryptic species resulting from nonecological speciation. These taxa present a conundrum when delimiting species: analyses rely heavily, if not entirely, on genetic data which over split species, while other lines of evidence lump. We showcase this conundrum in the harvester Theromaster brunneus, a low dispersal taxon with a wide geographic distribution and high potential for cryptic species. Integrating morphology, mitochondrial, and sub-genomic (double-digest RADSeq and ultraconserved elements) data, we find high discordance across analyses and data types in the number of inferred species, with further evidence that multispecies coalescent approaches over split. We demonstrate the power of a supervised machine learning approach in effectively delimiting cryptic species by creating a \"custom\" training data set derived from a well-studied lineage with similar biological characteristics as Theromaster. This novel approach uses known taxa with particular biological characteristics to inform unknown taxa with similar characteristics, using modern computational tools ideally suited for species delimitation. The approach also considers the natural history of organisms to make more biologically informed species delimitation decisions, and in principle is broadly applicable for taxa across the tree of life.</p>","PeriodicalId":55142,"journal":{"name":"Frontiers in Zoology","volume":null,"pages":null},"PeriodicalIF":2.6000,"publicationDate":"2022-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8862334/pdf/","citationCount":"11","resultStr":"{\"title\":\"Using natural history to guide supervised machine learning for cryptic species delimitation with genetic data.\",\"authors\":\"Shahan Derkarabetian,&nbsp;James Starrett,&nbsp;Marshal Hedin\",\"doi\":\"10.1186/s12983-022-00453-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The diversity of biological and ecological characteristics of organisms, and the underlying genetic patterns and processes of speciation, makes the development of universally applicable genetic species delimitation methods challenging. Many approaches, like those incorporating the multispecies coalescent, sometimes delimit populations and overestimate species numbers. This issue is exacerbated in taxa with inherently high population structure due to low dispersal ability, and in cryptic species resulting from nonecological speciation. These taxa present a conundrum when delimiting species: analyses rely heavily, if not entirely, on genetic data which over split species, while other lines of evidence lump. We showcase this conundrum in the harvester Theromaster brunneus, a low dispersal taxon with a wide geographic distribution and high potential for cryptic species. Integrating morphology, mitochondrial, and sub-genomic (double-digest RADSeq and ultraconserved elements) data, we find high discordance across analyses and data types in the number of inferred species, with further evidence that multispecies coalescent approaches over split. We demonstrate the power of a supervised machine learning approach in effectively delimiting cryptic species by creating a \\\"custom\\\" training data set derived from a well-studied lineage with similar biological characteristics as Theromaster. This novel approach uses known taxa with particular biological characteristics to inform unknown taxa with similar characteristics, using modern computational tools ideally suited for species delimitation. The approach also considers the natural history of organisms to make more biologically informed species delimitation decisions, and in principle is broadly applicable for taxa across the tree of life.</p>\",\"PeriodicalId\":55142,\"journal\":{\"name\":\"Frontiers in Zoology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2022-02-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8862334/pdf/\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Zoology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12983-022-00453-0\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ZOOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Zoology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12983-022-00453-0","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ZOOLOGY","Score":null,"Total":0}
引用次数: 11

摘要

生物和生态特征的多样性,以及潜在的遗传模式和物种形成过程,使得普遍适用的遗传物种划界方法的发展具有挑战性。许多方法,如那些结合多物种聚结的方法,有时划分种群和高估物种数量。这一问题在种群结构高、扩散能力低的分类群和非生态物种形成的隐物种中更为突出。这些分类群在划分物种时呈现出一个难题:分析严重依赖(如果不是完全依赖的话)基因数据,这些数据超过了物种的划分,而其他证据线则存在混淆。我们展示了这一难题的收割机的Theromaster brunneus,低分散分类单元具有广泛的地理分布和高潜在的隐种。整合形态学,线粒体和亚基因组(双消化RADSeq和超保守元件)数据,我们发现在推断物种数量的分析和数据类型之间存在高度不一致,进一步证明多物种聚结方法优于分裂。我们通过创建一个“自定义”训练数据集,证明了监督机器学习方法在有效划分神秘物种方面的强大功能,该数据集来自一个经过充分研究的谱系,具有与Theromaster相似的生物学特征。这种新颖的方法使用具有特定生物学特征的已知分类群来告知具有相似特征的未知分类群,使用非常适合物种划分的现代计算工具。该方法还考虑了生物的自然历史,以做出更有生物学依据的物种划分决定,原则上广泛适用于整个生命之树的分类群。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Using natural history to guide supervised machine learning for cryptic species delimitation with genetic data.

Using natural history to guide supervised machine learning for cryptic species delimitation with genetic data.

Using natural history to guide supervised machine learning for cryptic species delimitation with genetic data.

Using natural history to guide supervised machine learning for cryptic species delimitation with genetic data.

The diversity of biological and ecological characteristics of organisms, and the underlying genetic patterns and processes of speciation, makes the development of universally applicable genetic species delimitation methods challenging. Many approaches, like those incorporating the multispecies coalescent, sometimes delimit populations and overestimate species numbers. This issue is exacerbated in taxa with inherently high population structure due to low dispersal ability, and in cryptic species resulting from nonecological speciation. These taxa present a conundrum when delimiting species: analyses rely heavily, if not entirely, on genetic data which over split species, while other lines of evidence lump. We showcase this conundrum in the harvester Theromaster brunneus, a low dispersal taxon with a wide geographic distribution and high potential for cryptic species. Integrating morphology, mitochondrial, and sub-genomic (double-digest RADSeq and ultraconserved elements) data, we find high discordance across analyses and data types in the number of inferred species, with further evidence that multispecies coalescent approaches over split. We demonstrate the power of a supervised machine learning approach in effectively delimiting cryptic species by creating a "custom" training data set derived from a well-studied lineage with similar biological characteristics as Theromaster. This novel approach uses known taxa with particular biological characteristics to inform unknown taxa with similar characteristics, using modern computational tools ideally suited for species delimitation. The approach also considers the natural history of organisms to make more biologically informed species delimitation decisions, and in principle is broadly applicable for taxa across the tree of life.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
4.90
自引率
0.00%
发文量
29
审稿时长
>12 weeks
期刊介绍: Frontiers in Zoology is an open access, peer-reviewed online journal publishing high quality research articles and reviews on all aspects of animal life. As a biological discipline, zoology has one of the longest histories. Today it occasionally appears as though, due to the rapid expansion of life sciences, zoology has been replaced by more or less independent sub-disciplines amongst which exchange is often sparse. However, the recent advance of molecular methodology into "classical" fields of biology, and the development of theories that can explain phenomena on different levels of organisation, has led to a re-integration of zoological disciplines promoting a broader than usual approach to zoological questions. Zoology has re-emerged as an integrative discipline encompassing the most diverse aspects of animal life, from the level of the gene to the level of the ecosystem. Frontiers in Zoology is the first open access journal focusing on zoology as a whole. It aims to represent and re-unite the various disciplines that look at animal life from different perspectives and at providing the basis for a comprehensive understanding of zoological phenomena on all levels of analysis. Frontiers in Zoology provides a unique opportunity to publish high quality research and reviews on zoological issues that will be internationally accessible to any reader at no cost. The journal was initiated and is supported by the Deutsche Zoologische Gesellschaft, one of the largest national zoological societies with more than a century-long tradition in promoting high-level zoological research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信