Common to rare transfer learning (CORAL) enables inference and prediction for a quarter million rare Malagasy arthropods

IF 32.1 1区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Otso Ovaskainen, Steven Winter, Gleb Tikhonov, Nerea Abrego, Sten Anslan, Jeremy R. deWaard, Stephanie L. deWaard, Brian L. Fisher, Brendan Furneaux, Bess Hardwick, Deirdre Kerdraon, Mikko Pentinsaari, Dimby Raharinjanahary, Eric Tsiriniaina Rajoelison, Sujeevan Ratnasingham, Panu Somervuo, Jayme E. Sones, Evgeny V. Zakharov, Paul D. N. Hebert, Tomas Roslin, David Dunson
{"title":"Common to rare transfer learning (CORAL) enables inference and prediction for a quarter million rare Malagasy arthropods","authors":"Otso Ovaskainen, Steven Winter, Gleb Tikhonov, Nerea Abrego, Sten Anslan, Jeremy R. deWaard, Stephanie L. deWaard, Brian L. Fisher, Brendan Furneaux, Bess Hardwick, Deirdre Kerdraon, Mikko Pentinsaari, Dimby Raharinjanahary, Eric Tsiriniaina Rajoelison, Sujeevan Ratnasingham, Panu Somervuo, Jayme E. Sones, Evgeny V. Zakharov, Paul D. N. Hebert, Tomas Roslin, David Dunson","doi":"10.1038/s41592-025-02823-y","DOIUrl":null,"url":null,"abstract":"DNA-based biodiversity surveys result in massive-scale data, including up to millions of species—of which, most are rare. Making the most of such data for inference and prediction requires modeling approaches that can relate species occurrences to environmental and spatial predictors, while incorporating information about their taxonomic or phylogenetic placement. Even if the scalability of joint species distribution models to large communities has greatly advanced, incorporating hundreds of thousands of species has not been feasible to date, leading to compromised analyses. Here we present a ‘common to rare transfer learning’ (CORAL) approach, based on borrowing information from the common species to enable statistically and computationally efficient modeling of both common and rare species. We illustrate that CORAL leads to much improved prediction and inference in the context of DNA metabarcoding data from Madagascar, comprising 255,188 arthropod species detected in 2,874 samples. CORAL can infer the occurrence of rare species based on common species, using DNA metabarcoding data or other high-dimensional biodiversity data. The approach is illustrated on a large-scale biodiversity survey from Madagascar.","PeriodicalId":18981,"journal":{"name":"Nature Methods","volume":"22 10","pages":"2074-2082"},"PeriodicalIF":32.1000,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.comhttps://www.nature.com/articles/s41592-025-02823-y.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Methods","FirstCategoryId":"99","ListUrlMain":"https://www.nature.com/articles/s41592-025-02823-y","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

DNA-based biodiversity surveys result in massive-scale data, including up to millions of species—of which, most are rare. Making the most of such data for inference and prediction requires modeling approaches that can relate species occurrences to environmental and spatial predictors, while incorporating information about their taxonomic or phylogenetic placement. Even if the scalability of joint species distribution models to large communities has greatly advanced, incorporating hundreds of thousands of species has not been feasible to date, leading to compromised analyses. Here we present a ‘common to rare transfer learning’ (CORAL) approach, based on borrowing information from the common species to enable statistically and computationally efficient modeling of both common and rare species. We illustrate that CORAL leads to much improved prediction and inference in the context of DNA metabarcoding data from Madagascar, comprising 255,188 arthropod species detected in 2,874 samples. CORAL can infer the occurrence of rare species based on common species, using DNA metabarcoding data or other high-dimensional biodiversity data. The approach is illustrated on a large-scale biodiversity survey from Madagascar.

Abstract Image

共同稀有迁移学习(CORAL)能够对25万只稀有的马达加斯加节肢动物进行推理和预测。
基于dna的生物多样性调查产生了大规模的数据,包括多达数百万个物种,其中大多数是罕见的。为了充分利用这些数据进行推断和预测,需要采用建模方法,将物种的发生与环境和空间预测因素联系起来,同时纳入有关其分类或系统发育位置的信息。即使联合物种分布模型在大型群落中的可扩展性已经大大提高,但到目前为止,将数十万种物种纳入其中还不可行,这导致了分析的折衷。在这里,我们提出了一种“从常见物种到稀有物种的迁移学习”(CORAL)方法,该方法基于从常见物种借鉴的信息,使常见物种和稀有物种的统计和计算效率建模成为可能。我们表明,在马达加斯加DNA元条形码数据的背景下,CORAL导致了大大改进的预测和推断,其中包括2,874个样本中检测到的255,188种节肢动物。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Nature Methods
Nature Methods 生物-生化研究方法
CiteScore
58.70
自引率
1.70%
发文量
326
审稿时长
1 months
期刊介绍: Nature Methods is a monthly journal that focuses on publishing innovative methods and substantial enhancements to fundamental life sciences research techniques. Geared towards a diverse, interdisciplinary readership of researchers in academia and industry engaged in laboratory work, the journal offers new tools for research and emphasizes the immediate practical significance of the featured work. It publishes primary research papers and reviews recent technical and methodological advancements, with a particular interest in primary methods papers relevant to the biological and biomedical sciences. This includes methods rooted in chemistry with practical applications for studying biological problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信