Otso Ovaskainen, Steven Winter, Gleb Tikhonov, Nerea Abrego, Sten Anslan, Jeremy R. deWaard, Stephanie L. deWaard, Brian L. Fisher, Brendan Furneaux, Bess Hardwick, Deirdre Kerdraon, Mikko Pentinsaari, Dimby Raharinjanahary, Eric Tsiriniaina Rajoelison, Sujeevan Ratnasingham, Panu Somervuo, Jayme E. Sones, Evgeny V. Zakharov, Paul D. N. Hebert, Tomas Roslin, David Dunson
{"title":"Common to rare transfer learning (CORAL) enables inference and prediction for a quarter million rare Malagasy arthropods","authors":"Otso Ovaskainen, Steven Winter, Gleb Tikhonov, Nerea Abrego, Sten Anslan, Jeremy R. deWaard, Stephanie L. deWaard, Brian L. Fisher, Brendan Furneaux, Bess Hardwick, Deirdre Kerdraon, Mikko Pentinsaari, Dimby Raharinjanahary, Eric Tsiriniaina Rajoelison, Sujeevan Ratnasingham, Panu Somervuo, Jayme E. Sones, Evgeny V. Zakharov, Paul D. N. Hebert, Tomas Roslin, David Dunson","doi":"10.1038/s41592-025-02823-y","DOIUrl":null,"url":null,"abstract":"DNA-based biodiversity surveys result in massive-scale data, including up to millions of species—of which, most are rare. Making the most of such data for inference and prediction requires modeling approaches that can relate species occurrences to environmental and spatial predictors, while incorporating information about their taxonomic or phylogenetic placement. Even if the scalability of joint species distribution models to large communities has greatly advanced, incorporating hundreds of thousands of species has not been feasible to date, leading to compromised analyses. Here we present a ‘common to rare transfer learning’ (CORAL) approach, based on borrowing information from the common species to enable statistically and computationally efficient modeling of both common and rare species. We illustrate that CORAL leads to much improved prediction and inference in the context of DNA metabarcoding data from Madagascar, comprising 255,188 arthropod species detected in 2,874 samples. CORAL can infer the occurrence of rare species based on common species, using DNA metabarcoding data or other high-dimensional biodiversity data. The approach is illustrated on a large-scale biodiversity survey from Madagascar.","PeriodicalId":18981,"journal":{"name":"Nature Methods","volume":"22 10","pages":"2074-2082"},"PeriodicalIF":32.1000,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.comhttps://www.nature.com/articles/s41592-025-02823-y.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Methods","FirstCategoryId":"99","ListUrlMain":"https://www.nature.com/articles/s41592-025-02823-y","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
DNA-based biodiversity surveys result in massive-scale data, including up to millions of species—of which, most are rare. Making the most of such data for inference and prediction requires modeling approaches that can relate species occurrences to environmental and spatial predictors, while incorporating information about their taxonomic or phylogenetic placement. Even if the scalability of joint species distribution models to large communities has greatly advanced, incorporating hundreds of thousands of species has not been feasible to date, leading to compromised analyses. Here we present a ‘common to rare transfer learning’ (CORAL) approach, based on borrowing information from the common species to enable statistically and computationally efficient modeling of both common and rare species. We illustrate that CORAL leads to much improved prediction and inference in the context of DNA metabarcoding data from Madagascar, comprising 255,188 arthropod species detected in 2,874 samples. CORAL can infer the occurrence of rare species based on common species, using DNA metabarcoding data or other high-dimensional biodiversity data. The approach is illustrated on a large-scale biodiversity survey from Madagascar.
期刊介绍:
Nature Methods is a monthly journal that focuses on publishing innovative methods and substantial enhancements to fundamental life sciences research techniques. Geared towards a diverse, interdisciplinary readership of researchers in academia and industry engaged in laboratory work, the journal offers new tools for research and emphasizes the immediate practical significance of the featured work. It publishes primary research papers and reviews recent technical and methodological advancements, with a particular interest in primary methods papers relevant to the biological and biomedical sciences. This includes methods rooted in chemistry with practical applications for studying biological problems.