迈向下一代物种划分方法:机器学习应用概述。

IF 3.6 1区 生物学 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY
Matheus M.A. Salles, Fabricius M.C.B. Domingos
{"title":"迈向下一代物种划分方法:机器学习应用概述。","authors":"Matheus M.A. Salles,&nbsp;Fabricius M.C.B. Domingos","doi":"10.1016/j.ympev.2025.108368","DOIUrl":null,"url":null,"abstract":"<div><div>Species delimitation is the process of distinguishing between populations of the same species and distinct species of a particular group of organisms. Various methods exist for inferring species limits, whether based on morphological, molecular, or other types of data. In the case of methods based on DNA sequences, most of them are rooted in the coalescent theory. However, coalescence-based models have limitations, for instance regarding complex evolutionary scenarios and large datasets. In this context, machine learning (ML) can be considered as a promising analytical tool, and provides an effective way to explore dataset structures when species-level divergences are hypothesized. In this review, we examine the use of ML in species delimitation and provide an overview and critical appraisal of existing workflows. We also provide simple explanations on how the main types of ML approaches operate, which should help uninitiated researchers and students interested in the field. Our review suggests that while current ML methods designed to infer species limits are analytically powerful, they also present specific limitations and should not be considered as definitive alternatives to coalescent methods for species delimitation. Future ML enterprises to delimit species should consider the constraints related to the use of simulated data, as in other model-based methods relying on simulations. Conversely, the flexibility of ML algorithms offers a significant advantage by enabling the analysis of diverse data types (e.g., genetic and phenotypic) and handling large datasets effectively. We also propose best practices for the use of ML methods in species delimitation, offering insights into potential future applications. We expect that the proposed guidelines will be useful for enhancing the accessibility, effectiveness, and objectivity of ML in species delimitation.</div></div>","PeriodicalId":56109,"journal":{"name":"Molecular Phylogenetics and Evolution","volume":"210 ","pages":"Article 108368"},"PeriodicalIF":3.6000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards the next generation of species delimitation methods: an overview of machine learning applications\",\"authors\":\"Matheus M.A. Salles,&nbsp;Fabricius M.C.B. Domingos\",\"doi\":\"10.1016/j.ympev.2025.108368\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Species delimitation is the process of distinguishing between populations of the same species and distinct species of a particular group of organisms. Various methods exist for inferring species limits, whether based on morphological, molecular, or other types of data. In the case of methods based on DNA sequences, most of them are rooted in the coalescent theory. However, coalescence-based models have limitations, for instance regarding complex evolutionary scenarios and large datasets. In this context, machine learning (ML) can be considered as a promising analytical tool, and provides an effective way to explore dataset structures when species-level divergences are hypothesized. In this review, we examine the use of ML in species delimitation and provide an overview and critical appraisal of existing workflows. We also provide simple explanations on how the main types of ML approaches operate, which should help uninitiated researchers and students interested in the field. Our review suggests that while current ML methods designed to infer species limits are analytically powerful, they also present specific limitations and should not be considered as definitive alternatives to coalescent methods for species delimitation. Future ML enterprises to delimit species should consider the constraints related to the use of simulated data, as in other model-based methods relying on simulations. Conversely, the flexibility of ML algorithms offers a significant advantage by enabling the analysis of diverse data types (e.g., genetic and phenotypic) and handling large datasets effectively. We also propose best practices for the use of ML methods in species delimitation, offering insights into potential future applications. We expect that the proposed guidelines will be useful for enhancing the accessibility, effectiveness, and objectivity of ML in species delimitation.</div></div>\",\"PeriodicalId\":56109,\"journal\":{\"name\":\"Molecular Phylogenetics and Evolution\",\"volume\":\"210 \",\"pages\":\"Article 108368\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2025-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Phylogenetics and Evolution\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1055790325000855\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Phylogenetics and Evolution","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1055790325000855","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

物种划分是区分同一物种的种群和特定生物群体的不同物种的过程。存在各种各样的方法来推断物种界限,无论是基于形态学、分子还是其他类型的数据。在基于DNA序列的方法中,大多数方法都植根于聚结理论。然而,基于聚合的模型有局限性,例如在复杂的进化场景和大型数据集方面。在这种情况下,机器学习(ML)可以被认为是一个有前途的分析工具,并提供了一种有效的方法来探索数据集结构,当物种水平的分歧是假设。在这篇综述中,我们研究了ML在物种划分中的使用,并提供了现有工作流程的概述和批判性评估。我们还提供了关于主要类型的机器学习方法如何操作的简单解释,这应该有助于对该领域感兴趣的非入门研究人员和学生。我们的综述表明,虽然目前设计用于推断物种界限的ML方法在分析上是强大的,但它们也存在特定的局限性,不应被视为物种划分的决定性替代方法。未来的机器学习企业在划分物种时应该考虑与使用模拟数据相关的约束,就像其他依赖于模拟的基于模型的方法一样。相反,ML算法的灵活性提供了一个显着的优势,可以分析不同的数据类型(例如,遗传和表型)并有效地处理大型数据集。我们还提出了在物种划分中使用ML方法的最佳实践,为潜在的未来应用提供了见解。我们期望所提出的指南将有助于提高ML在物种划界中的可及性、有效性和客观性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Towards the next generation of species delimitation methods: an overview of machine learning applications
Species delimitation is the process of distinguishing between populations of the same species and distinct species of a particular group of organisms. Various methods exist for inferring species limits, whether based on morphological, molecular, or other types of data. In the case of methods based on DNA sequences, most of them are rooted in the coalescent theory. However, coalescence-based models have limitations, for instance regarding complex evolutionary scenarios and large datasets. In this context, machine learning (ML) can be considered as a promising analytical tool, and provides an effective way to explore dataset structures when species-level divergences are hypothesized. In this review, we examine the use of ML in species delimitation and provide an overview and critical appraisal of existing workflows. We also provide simple explanations on how the main types of ML approaches operate, which should help uninitiated researchers and students interested in the field. Our review suggests that while current ML methods designed to infer species limits are analytically powerful, they also present specific limitations and should not be considered as definitive alternatives to coalescent methods for species delimitation. Future ML enterprises to delimit species should consider the constraints related to the use of simulated data, as in other model-based methods relying on simulations. Conversely, the flexibility of ML algorithms offers a significant advantage by enabling the analysis of diverse data types (e.g., genetic and phenotypic) and handling large datasets effectively. We also propose best practices for the use of ML methods in species delimitation, offering insights into potential future applications. We expect that the proposed guidelines will be useful for enhancing the accessibility, effectiveness, and objectivity of ML in species delimitation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Molecular Phylogenetics and Evolution
Molecular Phylogenetics and Evolution 生物-进化生物学
CiteScore
7.50
自引率
7.30%
发文量
249
审稿时长
7.5 months
期刊介绍: Molecular Phylogenetics and Evolution is dedicated to bringing Darwin''s dream within grasp - to "have fairly true genealogical trees of each great kingdom of Nature." The journal provides a forum for molecular studies that advance our understanding of phylogeny and evolution, further the development of phylogenetically more accurate taxonomic classifications, and ultimately bring a unified classification for all the ramifying lines of life. Phylogeographic studies will be considered for publication if they offer EXCEPTIONAL theoretical or empirical advances.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信