Impact of variable selection and model complexity on the prediction of water quality parameters for Penaeus vannamei aquaculture in a short dataset context
Vinícius Fellype Cavalcanti de França, Luis Otavio Brito da Silva, Humber Agrelli de Andrade
{"title":"Impact of variable selection and model complexity on the prediction of water quality parameters for Penaeus vannamei aquaculture in a short dataset context","authors":"Vinícius Fellype Cavalcanti de França, Luis Otavio Brito da Silva, Humber Agrelli de Andrade","doi":"10.1016/j.aquaeng.2025.102640","DOIUrl":null,"url":null,"abstract":"<div><div>Aquaculture is expanding rapidly worldwide, increasing the demand for efficient water quality management in shrimp farming. In this study, we evaluated the impact of variable selection and model complexity on the prediction of the mean of water parameters using machine learning. Two variable selection approaches were applied: a Granger causality test to capture temporal predictability, and a backward procedure based on the Akaike Information Criterion to balance model fit and complexity. An experimental dataset of 106 observations of temperature, dissolved oxygen, salinity and pH was standardised and modelled using a linear regression and a random forest regressor. Model performance was assessed by cross-validation using mean squared error (MSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) as metrics. Our results showed a significant superiority of linear regressor over the random forest, suggesting that simpler models may be more effective with limited datasets than more complex models.</div></div>","PeriodicalId":8120,"journal":{"name":"Aquacultural Engineering","volume":"112 ","pages":"Article 102640"},"PeriodicalIF":4.3000,"publicationDate":"2025-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Aquacultural Engineering","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0144860925001293","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AGRICULTURAL ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Aquaculture is expanding rapidly worldwide, increasing the demand for efficient water quality management in shrimp farming. In this study, we evaluated the impact of variable selection and model complexity on the prediction of the mean of water parameters using machine learning. Two variable selection approaches were applied: a Granger causality test to capture temporal predictability, and a backward procedure based on the Akaike Information Criterion to balance model fit and complexity. An experimental dataset of 106 observations of temperature, dissolved oxygen, salinity and pH was standardised and modelled using a linear regression and a random forest regressor. Model performance was assessed by cross-validation using mean squared error (MSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) as metrics. Our results showed a significant superiority of linear regressor over the random forest, suggesting that simpler models may be more effective with limited datasets than more complex models.
期刊介绍:
Aquacultural Engineering is concerned with the design and development of effective aquacultural systems for marine and freshwater facilities. The journal aims to apply the knowledge gained from basic research which potentially can be translated into commercial operations.
Problems of scale-up and application of research data involve many parameters, both physical and biological, making it difficult to anticipate the interaction between the unit processes and the cultured animals. Aquacultural Engineering aims to develop this bioengineering interface for aquaculture and welcomes contributions in the following areas:
– Engineering and design of aquaculture facilities
– Engineering-based research studies
– Construction experience and techniques
– In-service experience, commissioning, operation
– Materials selection and their uses
– Quantification of biological data and constraints