两阶段模型改进了野生动物研究中的机器学习分类器：识别松鸡假阳性检测的案例研究

IF 5.8 2区环境科学与生态学 Q1 ECOLOGY

Ecological Informatics Pub Date : 2025-04-30 DOI:10.1016/j.ecoinf.2025.103166

Laurence A. Clarfeld , Katherina D. Gieder , Robert Abrams , Christopher Bernier , Joseph Cahill , Susan Staats , Scott Wixsom , Therese M. Donovan

{"title":"两阶段模型改进了野生动物研究中的机器学习分类器：识别松鸡假阳性检测的案例研究","authors":"Laurence A. Clarfeld , Katherina D. Gieder , Robert Abrams , Christopher Bernier , Joseph Cahill , Susan Staats , Scott Wixsom , Therese M. Donovan","doi":"10.1016/j.ecoinf.2025.103166","DOIUrl":null,"url":null,"abstract":"<div><div>Autonomous recording units are increasingly being used to monitor wildlife on large geographic and temporal scales, paired with machine learning (ML) to automate detection of wildlife. However, false positive detections from ML classifiers can result in erroneous ecological models that can lead to misguided management and conservation actions. We used a two-stage general approach to understand and reduce false positive detections, a technique in which outputs of the primary classification model are passed to a secondary classification model to yield the probability that a detection from the primary model is a true positive detection. This approach is demonstrated on two open-source models that detect Ruffed Grouse (<em>Bonasa umbellus</em>). We analyzed over 9500 h of acoustic data collected in 2022–2023 from the Green Mountain National Forest in Vermont, USA, and found the two models detected different types of acoustic signals associated with differing life history traits. The first model yielded 4106 detections (71.5 % true positives) while the second model yielded 524 detections (17.0 % true positives). Secondary logistic regression models separated true positives and false positives with high accuracy (84.5 % and 89.8 % respectively). Our findings go beyond improving Ruffed Grouse monitoring and conservation efforts to, more broadly, illustrate how two-stage ML approaches can improve the use of model-derived detections in wildlife research.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"89 ","pages":"Article 103166"},"PeriodicalIF":5.8000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Two-stage models improve machine learning classifiers in wildlife research: A case study in identifying false positive detections of Ruffed Grouse\",\"authors\":\"Laurence A. Clarfeld , Katherina D. Gieder , Robert Abrams , Christopher Bernier , Joseph Cahill , Susan Staats , Scott Wixsom , Therese M. Donovan\",\"doi\":\"10.1016/j.ecoinf.2025.103166\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Autonomous recording units are increasingly being used to monitor wildlife on large geographic and temporal scales, paired with machine learning (ML) to automate detection of wildlife. However, false positive detections from ML classifiers can result in erroneous ecological models that can lead to misguided management and conservation actions. We used a two-stage general approach to understand and reduce false positive detections, a technique in which outputs of the primary classification model are passed to a secondary classification model to yield the probability that a detection from the primary model is a true positive detection. This approach is demonstrated on two open-source models that detect Ruffed Grouse (<em>Bonasa umbellus</em>). We analyzed over 9500 h of acoustic data collected in 2022–2023 from the Green Mountain National Forest in Vermont, USA, and found the two models detected different types of acoustic signals associated with differing life history traits. The first model yielded 4106 detections (71.5 % true positives) while the second model yielded 524 detections (17.0 % true positives). Secondary logistic regression models separated true positives and false positives with high accuracy (84.5 % and 89.8 % respectively). Our findings go beyond improving Ruffed Grouse monitoring and conservation efforts to, more broadly, illustrate how two-stage ML approaches can improve the use of model-derived detections in wildlife research.</div></div>\",\"PeriodicalId\":51024,\"journal\":{\"name\":\"Ecological Informatics\",\"volume\":\"89 \",\"pages\":\"Article 103166\"},\"PeriodicalIF\":5.8000,\"publicationDate\":\"2025-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ecological Informatics\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S157495412500175X\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S157495412500175X","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

自动记录装置越来越多地用于在大地理和时间尺度上监测野生动物，并与机器学习（ML）相结合，自动检测野生动物。然而，来自ML分类器的假阳性检测可能导致错误的生态模型，从而导致错误的管理和保护行动。我们使用了一种两阶段的通用方法来理解和减少假阳性检测，这种技术将主要分类模型的输出传递给二级分类模型，以产生来自主要模型的检测是真阳性检测的概率。这种方法在两个检测Ruffed Grouse （Bonasa umbellus）的开源模型上进行了演示。我们分析了2022-2023年从美国佛蒙特州绿山国家森林收集的9500多小时的声学数据，发现两种模型检测到与不同生活史特征相关的不同类型的声学信号。第一种模型检测出4106例（71.5%真阳性率），第二种模型检测出524例（17.0%真阳性率）。二级逻辑回归模型分离真阳性和假阳性的准确率较高（分别为84.5%和89.8%）。我们的研究结果不仅改善了松鸡的监测和保护工作，更广泛地说，还说明了两阶段机器学习方法如何改善野生动物研究中模型衍生检测的使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Two-stage models improve machine learning classifiers in wildlife research: A case study in identifying false positive detections of Ruffed Grouse

Autonomous recording units are increasingly being used to monitor wildlife on large geographic and temporal scales, paired with machine learning (ML) to automate detection of wildlife. However, false positive detections from ML classifiers can result in erroneous ecological models that can lead to misguided management and conservation actions. We used a two-stage general approach to understand and reduce false positive detections, a technique in which outputs of the primary classification model are passed to a secondary classification model to yield the probability that a detection from the primary model is a true positive detection. This approach is demonstrated on two open-source models that detect Ruffed Grouse (Bonasa umbellus). We analyzed over 9500 h of acoustic data collected in 2022–2023 from the Green Mountain National Forest in Vermont, USA, and found the two models detected different types of acoustic signals associated with differing life history traits. The first model yielded 4106 detections (71.5 % true positives) while the second model yielded 524 detections (17.0 % true positives). Secondary logistic regression models separated true positives and false positives with high accuracy (84.5 % and 89.8 % respectively). Our findings go beyond improving Ruffed Grouse monitoring and conservation efforts to, more broadly, illustrate how two-stage ML approaches can improve the use of model-derived detections in wildlife research.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Ecological Informatics 环境科学-生态学

CiteScore

8.30

自引率

11.80%

发文量

346

审稿时长

46 days

期刊介绍： The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change. The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.