A framework for identifying the polyploid complex in Rorippa (Brassicaceae): combining trait evolution, herbarium records, and machine learning.

IF 3.6 2区 生物学 Q1 PLANT SCIENCES
Ting-Shen Han, Jun-Xian Lv, Yao-Wu Xing
{"title":"A framework for identifying the polyploid complex in Rorippa (Brassicaceae): combining trait evolution, herbarium records, and machine learning.","authors":"Ting-Shen Han, Jun-Xian Lv, Yao-Wu Xing","doi":"10.1093/aob/mcag050","DOIUrl":null,"url":null,"abstract":"<p><strong>Background and aims: </strong>Species identification in polyploid plants remains challenging due to morphological continuity and genomic redundancy. Such taxonomic uncertainties obscure evolutionary or ecological inference. A critical solution involves the reassessment of polyploid collections using stable diagnostic traits and integrative approaches. Here, we examined the Rorippa dubia-indica complex (Brassicaceae), a morphologically overlapping tetraploid-hexaploid lineage natively distributed in East Asia.</p><p><strong>Methods: </strong>We developed a framework that integrates experimental phenotyping, herbarium reassessment, and computational modeling for secondary species assessment of polyploid plants. The framework incorporates spatiotemporal data from 3,136 field-collected (2017-2020) and 2,015 herbarium (1893-2021) specimens. Species were circumscribed using experimental assessments of anatomical, cytological, and morphological traits, interpreted within a phylogenetically informed evolutionary context. Stable diagnostic traits were then applied to reidentify specimens for improved species distribution models. Finally, curated trait and species data were used to train machine learning classification models to reconstruct the diagnostic rationale underlying specimen identification.</p><p><strong>Key results: </strong>Seed arrangement, petal number, and genome size exhibited clear interspecific differentiation. Phylogenomic analyses based on chloroplast genomes further resolved species circumscription consistent with these traits. According to the revision of specimens and classification models defined by machine learning, we found that initial misidentification rates reached 12-50% across virtual or physical specimens, largely due to reliance on plastic traits such as leaf shape. These errors substantially distorted spatial distribution models and future climate projections.</p><p><strong>Conclusions: </strong>Our findings underscore the need for secondary specimen evaluation. The framework demonstrates the importance of integrating morphologic and phylogenetic inference with machine learning tools to resolve taxonomically difficult polyploid complexes. This approach offers direct applications for biodiversity assessment, evolutionary research, and conservation planning.</p>","PeriodicalId":8023,"journal":{"name":"Annals of botany","volume":" ","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2026-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of botany","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/aob/mcag050","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PLANT SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Background and aims: Species identification in polyploid plants remains challenging due to morphological continuity and genomic redundancy. Such taxonomic uncertainties obscure evolutionary or ecological inference. A critical solution involves the reassessment of polyploid collections using stable diagnostic traits and integrative approaches. Here, we examined the Rorippa dubia-indica complex (Brassicaceae), a morphologically overlapping tetraploid-hexaploid lineage natively distributed in East Asia.

Methods: We developed a framework that integrates experimental phenotyping, herbarium reassessment, and computational modeling for secondary species assessment of polyploid plants. The framework incorporates spatiotemporal data from 3,136 field-collected (2017-2020) and 2,015 herbarium (1893-2021) specimens. Species were circumscribed using experimental assessments of anatomical, cytological, and morphological traits, interpreted within a phylogenetically informed evolutionary context. Stable diagnostic traits were then applied to reidentify specimens for improved species distribution models. Finally, curated trait and species data were used to train machine learning classification models to reconstruct the diagnostic rationale underlying specimen identification.

Key results: Seed arrangement, petal number, and genome size exhibited clear interspecific differentiation. Phylogenomic analyses based on chloroplast genomes further resolved species circumscription consistent with these traits. According to the revision of specimens and classification models defined by machine learning, we found that initial misidentification rates reached 12-50% across virtual or physical specimens, largely due to reliance on plastic traits such as leaf shape. These errors substantially distorted spatial distribution models and future climate projections.

Conclusions: Our findings underscore the need for secondary specimen evaluation. The framework demonstrates the importance of integrating morphologic and phylogenetic inference with machine learning tools to resolve taxonomically difficult polyploid complexes. This approach offers direct applications for biodiversity assessment, evolutionary research, and conservation planning.

油菜科植物多倍体复合体鉴定框架:结合性状进化、植物标本室记录和机器学习。
背景与目的:由于多倍体植物的形态连续性和基因组冗余性,物种鉴定仍然具有挑战性。这种分类学上的不确定性模糊了进化或生态上的推断。一个关键的解决方案是利用稳定的诊断特征和综合方法对多倍体标本进行重新评估。在这里,我们研究了一种形态重叠的四倍体-六倍体原产于东亚的Brassicaceae(芸苔科)Rorippa dubia-indica complex。方法:我们开发了一个框架,将实验表型、植物标本馆重新评估和计算模型相结合,用于多倍体植物的二级物种评估。该框架整合了3136个野外采集标本(2017-2020年)和2015个植物标本馆标本(1893-2021年)的时空数据。利用解剖、细胞学和形态特征的实验评估来界定物种,并在系统发育知情的进化背景下进行解释。然后应用稳定的诊断特征对标本进行重新鉴定,以改进物种分布模型。最后,整理的特征和物种数据用于训练机器学习分类模型,以重建标本识别的诊断基础。关键结果:种子排列、花瓣数量和基因组大小表现出明显的种间分化。基于叶绿体基因组的系统基因组分析进一步解决了与这些性状一致的物种界限问题。根据对标本和机器学习定义的分类模型的修订,我们发现虚拟或物理标本的初始错误识别率达到12-50%,主要是由于依赖于叶片形状等塑性特征。这些误差极大地扭曲了空间分布模式和未来气候预测。结论:我们的发现强调了二次标本评估的必要性。该框架证明了将形态学和系统发育推理与机器学习工具相结合以解决分类困难的多倍体复合体的重要性。这种方法为生物多样性评估、进化研究和保护规划提供了直接的应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Annals of botany
Annals of botany 生物-植物科学
CiteScore
7.90
自引率
4.80%
发文量
138
审稿时长
3 months
期刊介绍: Annals of Botany is an international plant science journal publishing novel and rigorous research in all areas of plant science. It is published monthly in both electronic and printed forms with at least two extra issues each year that focus on a particular theme in plant biology. The Journal is managed by the Annals of Botany Company, a not-for-profit educational charity established to promote plant science worldwide. The Journal publishes original research papers, invited and submitted review articles, ''Research in Context'' expanding on original work, ''Botanical Briefings'' as short overviews of important topics, and ''Viewpoints'' giving opinions. All papers in each issue are summarized briefly in Content Snapshots , there are topical news items in the Plant Cuttings section and Book Reviews . A rigorous review process ensures that readers are exposed to genuine and novel advances across a wide spectrum of botanical knowledge. All papers aim to advance knowledge and make a difference to our understanding of plant science.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书