{"title":"为基于机器学习的鱼类分布建模确定最佳变量","authors":"Shaohua Xu, Jintao Wang, Xinjun Chen, Jiangfeng Zhu","doi":"10.1139/cjfas-2023-0197","DOIUrl":null,"url":null,"abstract":"Canadian Journal of Fisheries and Aquatic Sciences, Ahead of Print. <br/> Machine learning occupies a central position in the modeling of fish distribution patterns. The augmentation of explanatory variables in fish habitat through many kinds of observational methodologies necessitates the discernment of an optimal combination of these variables for fish distribution modeling. We proposed a feature selection technique, recursive feature elimination with cross-validation (RFECV), to determine optimal variables combinations for yellowfin tuna distribution in the Pacific Ocean. Four tree-based models, random forest, eXtreme Gradient Boosting, Light Gradient Boosting Machine, and categorical boosting driven by RFECV, were developed using comprehensive fisheries and biotic/abiotic data. Habitat variables including sea temperature, dissolved oxygen concentration, chlorophyll-a concentration, sea salinity, and sea surface height were identified as significant features by all models. The models were trained using the corresponding selected variables, and these trained models were employed to predict the spatiotemporal distribution of yellowfin tuna from 1995 to 2019. The results obtained could inform useful knowledge for the sustainable exploitation of yellowfin tuna in the Pacific Ocean and furnish a benchmark of feature selection for machine-learning-based distribution modeling of other pelagic species.","PeriodicalId":9515,"journal":{"name":"Canadian Journal of Fisheries and Aquatic Sciences","volume":"73 1","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Identifying optimal variables for machine-learning-based fish distribution modeling\",\"authors\":\"Shaohua Xu, Jintao Wang, Xinjun Chen, Jiangfeng Zhu\",\"doi\":\"10.1139/cjfas-2023-0197\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Canadian Journal of Fisheries and Aquatic Sciences, Ahead of Print. <br/> Machine learning occupies a central position in the modeling of fish distribution patterns. The augmentation of explanatory variables in fish habitat through many kinds of observational methodologies necessitates the discernment of an optimal combination of these variables for fish distribution modeling. We proposed a feature selection technique, recursive feature elimination with cross-validation (RFECV), to determine optimal variables combinations for yellowfin tuna distribution in the Pacific Ocean. Four tree-based models, random forest, eXtreme Gradient Boosting, Light Gradient Boosting Machine, and categorical boosting driven by RFECV, were developed using comprehensive fisheries and biotic/abiotic data. Habitat variables including sea temperature, dissolved oxygen concentration, chlorophyll-a concentration, sea salinity, and sea surface height were identified as significant features by all models. The models were trained using the corresponding selected variables, and these trained models were employed to predict the spatiotemporal distribution of yellowfin tuna from 1995 to 2019. The results obtained could inform useful knowledge for the sustainable exploitation of yellowfin tuna in the Pacific Ocean and furnish a benchmark of feature selection for machine-learning-based distribution modeling of other pelagic species.\",\"PeriodicalId\":9515,\"journal\":{\"name\":\"Canadian Journal of Fisheries and Aquatic Sciences\",\"volume\":\"73 1\",\"pages\":\"\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2024-03-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Canadian Journal of Fisheries and Aquatic Sciences\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://doi.org/10.1139/cjfas-2023-0197\",\"RegionNum\":2,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"FISHERIES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Canadian Journal of Fisheries and Aquatic Sciences","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.1139/cjfas-2023-0197","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"FISHERIES","Score":null,"Total":0}
Identifying optimal variables for machine-learning-based fish distribution modeling
Canadian Journal of Fisheries and Aquatic Sciences, Ahead of Print. Machine learning occupies a central position in the modeling of fish distribution patterns. The augmentation of explanatory variables in fish habitat through many kinds of observational methodologies necessitates the discernment of an optimal combination of these variables for fish distribution modeling. We proposed a feature selection technique, recursive feature elimination with cross-validation (RFECV), to determine optimal variables combinations for yellowfin tuna distribution in the Pacific Ocean. Four tree-based models, random forest, eXtreme Gradient Boosting, Light Gradient Boosting Machine, and categorical boosting driven by RFECV, were developed using comprehensive fisheries and biotic/abiotic data. Habitat variables including sea temperature, dissolved oxygen concentration, chlorophyll-a concentration, sea salinity, and sea surface height were identified as significant features by all models. The models were trained using the corresponding selected variables, and these trained models were employed to predict the spatiotemporal distribution of yellowfin tuna from 1995 to 2019. The results obtained could inform useful knowledge for the sustainable exploitation of yellowfin tuna in the Pacific Ocean and furnish a benchmark of feature selection for machine-learning-based distribution modeling of other pelagic species.
期刊介绍:
The Canadian Journal of Fisheries and Aquatic Sciences is the primary publishing vehicle for the multidisciplinary field of aquatic sciences. It publishes perspectives (syntheses, critiques, and re-evaluations), discussions (comments and replies), articles, and rapid communications, relating to current research on -omics, cells, organisms, populations, ecosystems, or processes that affect aquatic systems. The journal seeks to amplify, modify, question, or redirect accumulated knowledge in the field of fisheries and aquatic science.