Johanna de Haan-Ward, Simon J. Bonner, Douglas G. Woolford
{"title":"基于大数据集的稀有物种占用模型:一种子抽样方法","authors":"Johanna de Haan-Ward, Simon J. Bonner, Douglas G. Woolford","doi":"10.1002/env.70023","DOIUrl":null,"url":null,"abstract":"<p>Citizen science monitoring programs, such as the Breeding Bird Survey, provide a wealth of data for understanding species abundance and distribution. However, traditional approaches for occupancy modeling of rare species can be difficult to apply to large, imbalanced datasets. We propose a new method for occupancy modeling where the original dataset is subsampled seasonally, keeping all sites with at least one detection along with a random sample of sites with no detections. Occupancy models cannot be fit directly to these subsampled data because the assumption of binomial sampling no longer holds. However, we show that the occupancy probability is adjusted by an offset, meaning inference on the effects of predictors is still valid. We propose a method for model fitting via direct maximum likelihood and demonstrate via simulation that this leads to computational gains. We illustrate our method using data on Canada Warblers (<i>Cardellina canadensis</i>) from the Breeding Bird Survey in Ontario, Canada from 1997 to 2018, where 95% of sites have zero detections annually, demonstrating that we can accurately estimate the occupancy and detection parameters, including estimating the effects of habitat covariates, using just 10% of the original dataset.</p>","PeriodicalId":50512,"journal":{"name":"Environmetrics","volume":"36 5","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/env.70023","citationCount":"0","resultStr":"{\"title\":\"Occupancy Modeling for Rare Species Using Large Datasets: A Subsampling Approach\",\"authors\":\"Johanna de Haan-Ward, Simon J. Bonner, Douglas G. Woolford\",\"doi\":\"10.1002/env.70023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Citizen science monitoring programs, such as the Breeding Bird Survey, provide a wealth of data for understanding species abundance and distribution. However, traditional approaches for occupancy modeling of rare species can be difficult to apply to large, imbalanced datasets. We propose a new method for occupancy modeling where the original dataset is subsampled seasonally, keeping all sites with at least one detection along with a random sample of sites with no detections. Occupancy models cannot be fit directly to these subsampled data because the assumption of binomial sampling no longer holds. However, we show that the occupancy probability is adjusted by an offset, meaning inference on the effects of predictors is still valid. We propose a method for model fitting via direct maximum likelihood and demonstrate via simulation that this leads to computational gains. We illustrate our method using data on Canada Warblers (<i>Cardellina canadensis</i>) from the Breeding Bird Survey in Ontario, Canada from 1997 to 2018, where 95% of sites have zero detections annually, demonstrating that we can accurately estimate the occupancy and detection parameters, including estimating the effects of habitat covariates, using just 10% of the original dataset.</p>\",\"PeriodicalId\":50512,\"journal\":{\"name\":\"Environmetrics\",\"volume\":\"36 5\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2025-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/env.70023\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Environmetrics\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/env.70023\",\"RegionNum\":3,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmetrics","FirstCategoryId":"93","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/env.70023","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
Occupancy Modeling for Rare Species Using Large Datasets: A Subsampling Approach
Citizen science monitoring programs, such as the Breeding Bird Survey, provide a wealth of data for understanding species abundance and distribution. However, traditional approaches for occupancy modeling of rare species can be difficult to apply to large, imbalanced datasets. We propose a new method for occupancy modeling where the original dataset is subsampled seasonally, keeping all sites with at least one detection along with a random sample of sites with no detections. Occupancy models cannot be fit directly to these subsampled data because the assumption of binomial sampling no longer holds. However, we show that the occupancy probability is adjusted by an offset, meaning inference on the effects of predictors is still valid. We propose a method for model fitting via direct maximum likelihood and demonstrate via simulation that this leads to computational gains. We illustrate our method using data on Canada Warblers (Cardellina canadensis) from the Breeding Bird Survey in Ontario, Canada from 1997 to 2018, where 95% of sites have zero detections annually, demonstrating that we can accurately estimate the occupancy and detection parameters, including estimating the effects of habitat covariates, using just 10% of the original dataset.
期刊介绍:
Environmetrics, the official journal of The International Environmetrics Society (TIES), an Association of the International Statistical Institute, is devoted to the dissemination of high-quality quantitative research in the environmental sciences.
The journal welcomes pertinent and innovative submissions from quantitative disciplines developing new statistical and mathematical techniques, methods, and theories that solve modern environmental problems. Articles must proffer substantive, new statistical or mathematical advances to answer important scientific questions in the environmental sciences, or must develop novel or enhanced statistical methodology with clear applications to environmental science. New methods should be illustrated with recent environmental data.