{"title":"Interpretable fish abundance index prediction in tuna longline fisheries: A LightGBM-SHAP case study in the tropical Atlantic Ocean","authors":"Linhui Wang , Liming Song , Hengshou Sui , Bin Li","doi":"10.1016/j.fishres.2025.107468","DOIUrl":null,"url":null,"abstract":"<div><div>Accurately predicting fish abundance index is crucial for sustainable fisheries management. This study focuses on three highly migratory fish species: bigeye tuna (<em>Thunnus obesus</em>), yellowfin tuna (<em>Thunnus albacares</em>), and swordfish (<em>Xiphias gladius</em>) in the tropical Atlantic Ocean (TAO). Utilizing tuna longline logbook data from 2016 to 2019 and various environmental datasets, we employed four feature selection methods: no processing, correlation analysis with multicollinearity diagnosis, traditional Principal Component Analysis (PCA), and stratified PCA. Seven predictive models for abundance indices were rigorously compared to identify optimal modeling frameworks and feature engineering methodologies. An interpretable LightGBM-SHAP model was subsequently developed to predict CPUE while quantifying the relative contributions of key environmental drivers. The framework’s spatial applicability was verified using the KDE tool, Moran’s I index, and two correlation analyses. Results demonstrated that utilization of raw environmental variables without dimensionality reduction yielded superior predictive performance (R<sup>2</sup>>0.84 across all species), underscoring the necessity of context-appropriate feature selection. Spatial validation confirmed strong concordance between SHAP-derived predictions and observed CPUE distributions. Critical species-specific environmental determinants were identified: (1) the most influential factors were month, longitude, and latitude for bigeye tuna; (2) latitude, month, and D250 were the dominant factors for yellowfin tuna; (3) latitude, month, and D450 were key factors for swordfish. This study provides a comprehensive framework for predicting fish abundance index and interpreting the underlying environmental factors, thereby enhancing the interpretability of machine learning models in fisheries forecasting. The findings offer valuable insights for fisheries managers to identify potential fishing zones, adjust management strategies, and promote the sustainable utilization of fisheries resources in the TAO.</div></div>","PeriodicalId":50443,"journal":{"name":"Fisheries Research","volume":"288 ","pages":"Article 107468"},"PeriodicalIF":2.3000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fisheries Research","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S016578362500205X","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"FISHERIES","Score":null,"Total":0}
引用次数: 0
Abstract
Accurately predicting fish abundance index is crucial for sustainable fisheries management. This study focuses on three highly migratory fish species: bigeye tuna (Thunnus obesus), yellowfin tuna (Thunnus albacares), and swordfish (Xiphias gladius) in the tropical Atlantic Ocean (TAO). Utilizing tuna longline logbook data from 2016 to 2019 and various environmental datasets, we employed four feature selection methods: no processing, correlation analysis with multicollinearity diagnosis, traditional Principal Component Analysis (PCA), and stratified PCA. Seven predictive models for abundance indices were rigorously compared to identify optimal modeling frameworks and feature engineering methodologies. An interpretable LightGBM-SHAP model was subsequently developed to predict CPUE while quantifying the relative contributions of key environmental drivers. The framework’s spatial applicability was verified using the KDE tool, Moran’s I index, and two correlation analyses. Results demonstrated that utilization of raw environmental variables without dimensionality reduction yielded superior predictive performance (R2>0.84 across all species), underscoring the necessity of context-appropriate feature selection. Spatial validation confirmed strong concordance between SHAP-derived predictions and observed CPUE distributions. Critical species-specific environmental determinants were identified: (1) the most influential factors were month, longitude, and latitude for bigeye tuna; (2) latitude, month, and D250 were the dominant factors for yellowfin tuna; (3) latitude, month, and D450 were key factors for swordfish. This study provides a comprehensive framework for predicting fish abundance index and interpreting the underlying environmental factors, thereby enhancing the interpretability of machine learning models in fisheries forecasting. The findings offer valuable insights for fisheries managers to identify potential fishing zones, adjust management strategies, and promote the sustainable utilization of fisheries resources in the TAO.
期刊介绍:
This journal provides an international forum for the publication of papers in the areas of fisheries science, fishing technology, fisheries management and relevant socio-economics. The scope covers fisheries in salt, brackish and freshwater systems, and all aspects of associated ecology, environmental aspects of fisheries, and economics. Both theoretical and practical papers are acceptable, including laboratory and field experimental studies relevant to fisheries. Papers on the conservation of exploitable living resources are welcome. Review and Viewpoint articles are also published. As the specified areas inevitably impinge on and interrelate with each other, the approach of the journal is multidisciplinary, and authors are encouraged to emphasise the relevance of their own work to that of other disciplines. The journal is intended for fisheries scientists, biological oceanographers, gear technologists, economists, managers, administrators, policy makers and legislators.