Dante M.L. Horemans , Jennifer C. Lin , Marjorie A.M. Friedrichs , Pierre St-Laurent , Raleigh R. Hood , Christopher W. Brown
{"title":"观测和模型导出的训练变量之间的共线性对河口藻类种类分布模型的影响","authors":"Dante M.L. Horemans , Jennifer C. Lin , Marjorie A.M. Friedrichs , Pierre St-Laurent , Raleigh R. Hood , Christopher W. Brown","doi":"10.1016/j.ecoinf.2025.103225","DOIUrl":null,"url":null,"abstract":"<div><div>Forecasts of organism distributions in time and space are needed to mitigate risks associated with changes in environmental conditions. These forecasts are often generated using correlative species distribution models (SDMs) that relate environmental variables to species presence or abundance. Biological complexity makes the construction of SDMs challenging because the collinearity between the environmental variables used to train the SDM may increase model parameter uncertainty. To analyze the effect of collinearity on SDMs, we (1) train SDMs for seven estuarine algal species commonly observed in the Chesapeake Bay (U.S.A.) using different levels of collinearity in the training information, (2) identify the environmental predictors, and (3) study their association with species presence using two statistical techniques (generalized linear models and regression trees). The novelty of our contribution is that our analysis uses both environmental <em>in situ</em> observations and environmental information generated by a mechanistic model. The environmental variables show strong collinearities in both the <em>in situ</em> observations (32 out of the total of 165 correlations) and mechanistic model output (12 out of the total of 120 correlations). To determine how collinearity between these variables affect our SDM results, we remove environmental variables that surpass a specific correlation threshold. We find that using these two different types of training information (i.e., observed vs. modeled) affects (1) the optimal set of predictors, (2) the associations between environmental variables and algal presence, and (3) the model’s predictive skill. Water temperature is generally selected as an important predictor. Strong positive or negative associations between environmental variables and algal presence are not substantially impacted by the type of training information used. Although removing collinearities may result in the detection of new important predictors, it may also result in a slight decrease (<span><math><mo>∼</mo></math></span> 5 %) of the SDM prediction skill, depending on the species of interest and type of training information. Our findings suggest that the main environmental predictors rely on both the species characteristics and training information type used in SDM construction, and highlight the challenge of interpreting the associations between environmental conditions and species presence predicted by these SDMs. These insights help us to better understand the environmental conditions of importance to these algal species and hence optimize monitoring efforts by revealing which <em>in situ</em> observations are vital to accurately forecast blooms of these estuarine algae.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"90 ","pages":"Article 103225"},"PeriodicalIF":7.3000,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The effect of collinearity between observed and model derived training variables on estuarine algal species distribution models\",\"authors\":\"Dante M.L. Horemans , Jennifer C. Lin , Marjorie A.M. Friedrichs , Pierre St-Laurent , Raleigh R. Hood , Christopher W. Brown\",\"doi\":\"10.1016/j.ecoinf.2025.103225\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Forecasts of organism distributions in time and space are needed to mitigate risks associated with changes in environmental conditions. These forecasts are often generated using correlative species distribution models (SDMs) that relate environmental variables to species presence or abundance. Biological complexity makes the construction of SDMs challenging because the collinearity between the environmental variables used to train the SDM may increase model parameter uncertainty. To analyze the effect of collinearity on SDMs, we (1) train SDMs for seven estuarine algal species commonly observed in the Chesapeake Bay (U.S.A.) using different levels of collinearity in the training information, (2) identify the environmental predictors, and (3) study their association with species presence using two statistical techniques (generalized linear models and regression trees). The novelty of our contribution is that our analysis uses both environmental <em>in situ</em> observations and environmental information generated by a mechanistic model. The environmental variables show strong collinearities in both the <em>in situ</em> observations (32 out of the total of 165 correlations) and mechanistic model output (12 out of the total of 120 correlations). To determine how collinearity between these variables affect our SDM results, we remove environmental variables that surpass a specific correlation threshold. We find that using these two different types of training information (i.e., observed vs. modeled) affects (1) the optimal set of predictors, (2) the associations between environmental variables and algal presence, and (3) the model’s predictive skill. Water temperature is generally selected as an important predictor. Strong positive or negative associations between environmental variables and algal presence are not substantially impacted by the type of training information used. Although removing collinearities may result in the detection of new important predictors, it may also result in a slight decrease (<span><math><mo>∼</mo></math></span> 5 %) of the SDM prediction skill, depending on the species of interest and type of training information. Our findings suggest that the main environmental predictors rely on both the species characteristics and training information type used in SDM construction, and highlight the challenge of interpreting the associations between environmental conditions and species presence predicted by these SDMs. These insights help us to better understand the environmental conditions of importance to these algal species and hence optimize monitoring efforts by revealing which <em>in situ</em> observations are vital to accurately forecast blooms of these estuarine algae.</div></div>\",\"PeriodicalId\":51024,\"journal\":{\"name\":\"Ecological Informatics\",\"volume\":\"90 \",\"pages\":\"Article 103225\"},\"PeriodicalIF\":7.3000,\"publicationDate\":\"2025-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ecological Informatics\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1574954125002341\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574954125002341","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
The effect of collinearity between observed and model derived training variables on estuarine algal species distribution models
Forecasts of organism distributions in time and space are needed to mitigate risks associated with changes in environmental conditions. These forecasts are often generated using correlative species distribution models (SDMs) that relate environmental variables to species presence or abundance. Biological complexity makes the construction of SDMs challenging because the collinearity between the environmental variables used to train the SDM may increase model parameter uncertainty. To analyze the effect of collinearity on SDMs, we (1) train SDMs for seven estuarine algal species commonly observed in the Chesapeake Bay (U.S.A.) using different levels of collinearity in the training information, (2) identify the environmental predictors, and (3) study their association with species presence using two statistical techniques (generalized linear models and regression trees). The novelty of our contribution is that our analysis uses both environmental in situ observations and environmental information generated by a mechanistic model. The environmental variables show strong collinearities in both the in situ observations (32 out of the total of 165 correlations) and mechanistic model output (12 out of the total of 120 correlations). To determine how collinearity between these variables affect our SDM results, we remove environmental variables that surpass a specific correlation threshold. We find that using these two different types of training information (i.e., observed vs. modeled) affects (1) the optimal set of predictors, (2) the associations between environmental variables and algal presence, and (3) the model’s predictive skill. Water temperature is generally selected as an important predictor. Strong positive or negative associations between environmental variables and algal presence are not substantially impacted by the type of training information used. Although removing collinearities may result in the detection of new important predictors, it may also result in a slight decrease ( 5 %) of the SDM prediction skill, depending on the species of interest and type of training information. Our findings suggest that the main environmental predictors rely on both the species characteristics and training information type used in SDM construction, and highlight the challenge of interpreting the associations between environmental conditions and species presence predicted by these SDMs. These insights help us to better understand the environmental conditions of importance to these algal species and hence optimize monitoring efforts by revealing which in situ observations are vital to accurately forecast blooms of these estuarine algae.
期刊介绍:
The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change.
The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.