{"title":"Utilizing the random forest algorithm and interpretable machine learning to inform post-stratification of commercial fisheries data","authors":"Jason Gasper , Jennifer Cahalan","doi":"10.1016/j.fishres.2024.107253","DOIUrl":null,"url":null,"abstract":"<div><div>Federal groundfish fisheries off Alaska are managed based on near-real time estimates of catch generated using a combination of data from the North Pacific Groundfish and Pacific Halibut Observer Program, which deploys observers and Electronic Monitoring systems into the fisheries to sample catch, and industry-reported information. Catch is carefully monitored against limits that are based on biological constraints, quota allocations, or to control discard amounts. However, estimates of fish discarded at-sea (not retained for sale) can have large variance due to factors such as fishing behavior, species-specific vulnerability to fishing, and sample sizes. Post-stratification is a statistical approach widely used to improve the precision of catch estimates within a population because it controls for variance while also not relying on covariates known prior to sampling, which can be costly to collect or are unknown. Strategic use of post-stratification may increase the precision of estimates when compared to designs without post-stratification. However, choosing fishery characteristics to define post-strata may be elusive due to the high dimensionality of fishery data and complexity of creating post-strata that are optimized for multiple species. We propose a novel application of random forest classification and design-based estimation to explore multivariate post-stratification designs. These designs were evaluated by selecting the best performing trees from an ensemble using design-based estimation metrics. Results showed a large improvement in the precision of estimates by using the best-performing trees to label data and create post-strata. Moreover, through the use of subject matter expertise to evaluate the best performing trees, this method identified combinations of covariates that were not considered in previous estimation designs, and allows for exploration and testing of alternative post-strata designs that could be implemented in a management system.</div></div>","PeriodicalId":50443,"journal":{"name":"Fisheries Research","volume":"281 ","pages":"Article 107253"},"PeriodicalIF":2.2000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fisheries Research","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0165783624003175","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"FISHERIES","Score":null,"Total":0}
引用次数: 0
Abstract
Federal groundfish fisheries off Alaska are managed based on near-real time estimates of catch generated using a combination of data from the North Pacific Groundfish and Pacific Halibut Observer Program, which deploys observers and Electronic Monitoring systems into the fisheries to sample catch, and industry-reported information. Catch is carefully monitored against limits that are based on biological constraints, quota allocations, or to control discard amounts. However, estimates of fish discarded at-sea (not retained for sale) can have large variance due to factors such as fishing behavior, species-specific vulnerability to fishing, and sample sizes. Post-stratification is a statistical approach widely used to improve the precision of catch estimates within a population because it controls for variance while also not relying on covariates known prior to sampling, which can be costly to collect or are unknown. Strategic use of post-stratification may increase the precision of estimates when compared to designs without post-stratification. However, choosing fishery characteristics to define post-strata may be elusive due to the high dimensionality of fishery data and complexity of creating post-strata that are optimized for multiple species. We propose a novel application of random forest classification and design-based estimation to explore multivariate post-stratification designs. These designs were evaluated by selecting the best performing trees from an ensemble using design-based estimation metrics. Results showed a large improvement in the precision of estimates by using the best-performing trees to label data and create post-strata. Moreover, through the use of subject matter expertise to evaluate the best performing trees, this method identified combinations of covariates that were not considered in previous estimation designs, and allows for exploration and testing of alternative post-strata designs that could be implemented in a management system.
期刊介绍:
This journal provides an international forum for the publication of papers in the areas of fisheries science, fishing technology, fisheries management and relevant socio-economics. The scope covers fisheries in salt, brackish and freshwater systems, and all aspects of associated ecology, environmental aspects of fisheries, and economics. Both theoretical and practical papers are acceptable, including laboratory and field experimental studies relevant to fisheries. Papers on the conservation of exploitable living resources are welcome. Review and Viewpoint articles are also published. As the specified areas inevitably impinge on and interrelate with each other, the approach of the journal is multidisciplinary, and authors are encouraged to emphasise the relevance of their own work to that of other disciplines. The journal is intended for fisheries scientists, biological oceanographers, gear technologists, economists, managers, administrators, policy makers and legislators.