Inferring Taxonomic Affinities and Genetic Distances Using Morphological Features Extracted from Specimen Images: a Case Study with a Bivalve dataset.

IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY
Martin Hofmann, Steffen Kiel, Lara M Kösters, Jana Wäldchen, Patrick Mäder
{"title":"Inferring Taxonomic Affinities and Genetic Distances Using Morphological Features Extracted from Specimen Images: a Case Study with a Bivalve dataset.","authors":"Martin Hofmann, Steffen Kiel, Lara M Kösters, Jana Wäldchen, Patrick Mäder","doi":"10.1093/sysbio/syae042","DOIUrl":null,"url":null,"abstract":"<p><p>Reconstructing the tree of life and understanding the relationships of taxa are core questions in evolutionary and systematic biology. The main advances in this field in the last decades were derived from molecular phylogenetics; however, for most species, molecular data are not available. Here, we explore the applicability of two deep learning methods - supervised classification approaches and unsupervised similarity learning - to infer organism relationships from specimen images. As a basis, we assembled an image dataset covering 4144 bivalve species belonging to 74 families across all orders and subclasses of the extant Bivalvia, with molecular phylogenetic data being available for all families and a complete taxonomic hierarchy for all species. The suitability of this dataset for deep learning experiments was evidenced by an ablation study resulting in almost 80% accuracy for identifications on the species level. Three sets of experiments were performed using our dataset. First, we included taxonomic hierarchy and genetic distances in a supervised learning approach to obtain predictions on several taxonomic levels simultaneously. Here, we stimulated the model to consider features shared between closely related taxa to be more critical for their classification than features shared with distantly related taxa, imprinting phylogenetic and taxonomic affinities into the architecture and training procedure. Second, we used transfer learning and similarity learning approaches for zero-shot experiments to identify the higher-level taxonomic affinities of test species that the models had not been trained on. The models assigned the unknown species to their respective genera with approximately 48% and 67% accuracy. Lastly, we used unsupervised similarity learning to infer the relatedness of the images without prior knowledge of their taxonomic or phylogenetic affinities. The results clearly showed similarities between visual appearance and genetic relationships at the higher taxonomic levels. The correlation was 0.6 for the most species-rich subclass (Imparidentia), ranging from 0.5 to 0.7 for the orders with the most images. Overall, the correlation between visual similarity and genetic distances at the family level was 0.78. However, fine-grained reconstructions based on these observed correlations, such as sister-taxa relationships, require further work. Overall, our results broaden the applicability of automated taxon identification systems and provide a new avenue for estimating phylogenetic relationships from specimen images.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":null,"pages":null},"PeriodicalIF":6.1000,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systematic Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/sysbio/syae042","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EVOLUTIONARY BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Reconstructing the tree of life and understanding the relationships of taxa are core questions in evolutionary and systematic biology. The main advances in this field in the last decades were derived from molecular phylogenetics; however, for most species, molecular data are not available. Here, we explore the applicability of two deep learning methods - supervised classification approaches and unsupervised similarity learning - to infer organism relationships from specimen images. As a basis, we assembled an image dataset covering 4144 bivalve species belonging to 74 families across all orders and subclasses of the extant Bivalvia, with molecular phylogenetic data being available for all families and a complete taxonomic hierarchy for all species. The suitability of this dataset for deep learning experiments was evidenced by an ablation study resulting in almost 80% accuracy for identifications on the species level. Three sets of experiments were performed using our dataset. First, we included taxonomic hierarchy and genetic distances in a supervised learning approach to obtain predictions on several taxonomic levels simultaneously. Here, we stimulated the model to consider features shared between closely related taxa to be more critical for their classification than features shared with distantly related taxa, imprinting phylogenetic and taxonomic affinities into the architecture and training procedure. Second, we used transfer learning and similarity learning approaches for zero-shot experiments to identify the higher-level taxonomic affinities of test species that the models had not been trained on. The models assigned the unknown species to their respective genera with approximately 48% and 67% accuracy. Lastly, we used unsupervised similarity learning to infer the relatedness of the images without prior knowledge of their taxonomic or phylogenetic affinities. The results clearly showed similarities between visual appearance and genetic relationships at the higher taxonomic levels. The correlation was 0.6 for the most species-rich subclass (Imparidentia), ranging from 0.5 to 0.7 for the orders with the most images. Overall, the correlation between visual similarity and genetic distances at the family level was 0.78. However, fine-grained reconstructions based on these observed correlations, such as sister-taxa relationships, require further work. Overall, our results broaden the applicability of automated taxon identification systems and provide a new avenue for estimating phylogenetic relationships from specimen images.

利用从标本图像中提取的形态学特征推断分类亲缘关系和遗传距离:双壳类动物数据集案例研究。
重建生命树和了解类群关系是进化生物学和系统生物学的核心问题。过去几十年来,这一领域的主要进展来自分子系统学;然而,对于大多数物种来说,分子数据是不可用的。在这里,我们探索了两种深度学习方法--监督分类方法和无监督相似性学习--在从标本图像推断生物关系方面的适用性。在此基础上,我们建立了一个图像数据集,涵盖了现存双壳纲所有目和亚目的 74 个科的 4144 个双壳类物种,所有科都有分子系统发生学数据,所有物种都有完整的分类层次结构。通过一项消融研究,该数据集的物种识别准确率接近 80%,这证明了该数据集适合进行深度学习实验。使用我们的数据集进行了三组实验。首先,我们在监督学习方法中加入了分类层次和遗传距离,以同时获得多个分类层次的预测结果。在此,我们激励模型考虑近缘类群之间共享的特征比远缘类群共享的特征对其分类更为重要,从而将系统发生学和分类学亲缘关系印刻到结构和训练程序中。其次,我们利用迁移学习和相似性学习方法进行了零次实验,以确定模型未训练过的测试物种的高层分类亲缘关系。模型将未知物种归入各自属的准确率分别为 48% 和 67%。最后,我们使用无监督相似性学习来推断图像的亲缘关系,而无需事先了解其分类学或系统发育亲缘关系。结果清楚地表明,在较高的分类水平上,视觉外观与遗传关系之间存在相似性。物种最丰富的亚纲(Imparidentia)的相关性为 0.6,图像最多的目为 0.5 至 0.7。总体而言,视觉相似性与科级遗传距离之间的相关性为 0.78。然而,基于这些观察到的相关性进行细粒度重建,如姐妹-同属关系,还需要进一步的工作。总之,我们的研究结果拓宽了自动分类鉴定系统的适用范围,并为从标本图像中估计系统发育关系提供了一条新途径。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Systematic Biology
Systematic Biology 生物-进化生物学
CiteScore
13.00
自引率
7.70%
发文量
70
审稿时长
6-12 weeks
期刊介绍: Systematic Biology is the bimonthly journal of the Society of Systematic Biologists. Papers for the journal are original contributions to the theory, principles, and methods of systematics as well as phylogeny, evolution, morphology, biogeography, paleontology, genetics, and the classification of all living things. A Points of View section offers a forum for discussion, while book reviews and announcements of general interest are also featured.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信