{"title":"Random forest regression models in ecology: Accounting for messy biological data and producing predictions with uncertainty","authors":"Caitlin I. Allen Akselrud","doi":"10.1016/j.fishres.2024.107161","DOIUrl":null,"url":null,"abstract":"<div><p>Machine learning methods such as random forest regression models are useful tools in ecology when applied correctly, although features inherent to ecological data sets can lead to over-fitting or uncertain predictions. Here, a set of methods are outlined to account for temporal autocorrelation, and sparse, short, or missing data for random forest predictions. Methods are also provided for estimating prediction uncertainty due to the combination of inherent randomness in the random forest algorithm and sparse input data. This suite of methods was used to generate pre-season predictions of total catches with uncertainty for California market squid (<em>Doryteuthis opalescens</em>), the most valuable fishery in California (by ex-vessel value). The methodology presented in this analysis is not only robust, incorporating key cross-validation and hyperparameter tuning techniques from across disciplines, but is also flexible, making it applicable to various ecological and fisheries datasets beyond market squid.</p></div>","PeriodicalId":50443,"journal":{"name":"Fisheries Research","volume":"280 ","pages":"Article 107161"},"PeriodicalIF":2.2000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fisheries Research","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S016578362400225X","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"FISHERIES","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning methods such as random forest regression models are useful tools in ecology when applied correctly, although features inherent to ecological data sets can lead to over-fitting or uncertain predictions. Here, a set of methods are outlined to account for temporal autocorrelation, and sparse, short, or missing data for random forest predictions. Methods are also provided for estimating prediction uncertainty due to the combination of inherent randomness in the random forest algorithm and sparse input data. This suite of methods was used to generate pre-season predictions of total catches with uncertainty for California market squid (Doryteuthis opalescens), the most valuable fishery in California (by ex-vessel value). The methodology presented in this analysis is not only robust, incorporating key cross-validation and hyperparameter tuning techniques from across disciplines, but is also flexible, making it applicable to various ecological and fisheries datasets beyond market squid.
期刊介绍:
This journal provides an international forum for the publication of papers in the areas of fisheries science, fishing technology, fisheries management and relevant socio-economics. The scope covers fisheries in salt, brackish and freshwater systems, and all aspects of associated ecology, environmental aspects of fisheries, and economics. Both theoretical and practical papers are acceptable, including laboratory and field experimental studies relevant to fisheries. Papers on the conservation of exploitable living resources are welcome. Review and Viewpoint articles are also published. As the specified areas inevitably impinge on and interrelate with each other, the approach of the journal is multidisciplinary, and authors are encouraged to emphasise the relevance of their own work to that of other disciplines. The journal is intended for fisheries scientists, biological oceanographers, gear technologists, economists, managers, administrators, policy makers and legislators.