{"title":"多重共线性和数据粒度对溪流温度回归模型的影响","authors":"Halil I. Dertli , Daniel B. Hayes , Troy G. Zorn","doi":"10.1016/j.jhydrol.2024.131572","DOIUrl":null,"url":null,"abstract":"<div><p>Water temperature is a key factor influencing biota of stream ecosystems. Hence, it is important to comprehend the environmental drivers of stream temperature for robust prediction of conditions and effective management of stream communities. Linear regression models are commonly used for predictive purposes, but their predictive capacity and interpretability can be significantly affected by their complexity and the structure of input data. In some cases, researchers may be obligated to favor prediction power or interpretability while compromising the other. Therefore, insight into relationships between model fit, correlation among predictor variables (i.e., multicollinearity), and level of temporal aggregation of data (i.e., data granularity) may be helpful to reduce such trade-offs. In this paper, we investigated these relationships within a hierarchical set of multiple linear regression (MLR) models examining environmental factors influencing stream temperature dynamics. Our findings showed that as the number of predictor variables (i.e., model complexity) increased, the magnitude of multicollinearity in MLR models increased, but model fit also increased. The results also revealed that using data averaged over longer time frames (i.e., coarser data granularity) yielded high multicollinearity, as indexed by variance inflation factor values (VIF) for all model predictors. This led to higher variance in parameter estimates (i.e., parameter instability) and potential challenges in model interpretation as the sign of parameter estimates changed in many streams examined. Multicollinearity was not the only reason for these changes in the sign of parameter estimates as they were also observed in simple linear regression models across varying levels of data granularity. Based on our findings, we conclude that the selection of data granularity is an important consideration in multiple regression modeling, with profound implications for model interpretability.</p></div>","PeriodicalId":362,"journal":{"name":"Journal of Hydrology","volume":null,"pages":null},"PeriodicalIF":5.9000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Effects of multicollinearity and data granularity on regression models of stream temperature\",\"authors\":\"Halil I. Dertli , Daniel B. Hayes , Troy G. Zorn\",\"doi\":\"10.1016/j.jhydrol.2024.131572\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Water temperature is a key factor influencing biota of stream ecosystems. Hence, it is important to comprehend the environmental drivers of stream temperature for robust prediction of conditions and effective management of stream communities. Linear regression models are commonly used for predictive purposes, but their predictive capacity and interpretability can be significantly affected by their complexity and the structure of input data. In some cases, researchers may be obligated to favor prediction power or interpretability while compromising the other. Therefore, insight into relationships between model fit, correlation among predictor variables (i.e., multicollinearity), and level of temporal aggregation of data (i.e., data granularity) may be helpful to reduce such trade-offs. In this paper, we investigated these relationships within a hierarchical set of multiple linear regression (MLR) models examining environmental factors influencing stream temperature dynamics. Our findings showed that as the number of predictor variables (i.e., model complexity) increased, the magnitude of multicollinearity in MLR models increased, but model fit also increased. The results also revealed that using data averaged over longer time frames (i.e., coarser data granularity) yielded high multicollinearity, as indexed by variance inflation factor values (VIF) for all model predictors. This led to higher variance in parameter estimates (i.e., parameter instability) and potential challenges in model interpretation as the sign of parameter estimates changed in many streams examined. Multicollinearity was not the only reason for these changes in the sign of parameter estimates as they were also observed in simple linear regression models across varying levels of data granularity. Based on our findings, we conclude that the selection of data granularity is an important consideration in multiple regression modeling, with profound implications for model interpretability.</p></div>\",\"PeriodicalId\":362,\"journal\":{\"name\":\"Journal of Hydrology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.9000,\"publicationDate\":\"2024-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Hydrology\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0022169424009685\",\"RegionNum\":1,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, CIVIL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hydrology","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022169424009685","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}
Effects of multicollinearity and data granularity on regression models of stream temperature
Water temperature is a key factor influencing biota of stream ecosystems. Hence, it is important to comprehend the environmental drivers of stream temperature for robust prediction of conditions and effective management of stream communities. Linear regression models are commonly used for predictive purposes, but their predictive capacity and interpretability can be significantly affected by their complexity and the structure of input data. In some cases, researchers may be obligated to favor prediction power or interpretability while compromising the other. Therefore, insight into relationships between model fit, correlation among predictor variables (i.e., multicollinearity), and level of temporal aggregation of data (i.e., data granularity) may be helpful to reduce such trade-offs. In this paper, we investigated these relationships within a hierarchical set of multiple linear regression (MLR) models examining environmental factors influencing stream temperature dynamics. Our findings showed that as the number of predictor variables (i.e., model complexity) increased, the magnitude of multicollinearity in MLR models increased, but model fit also increased. The results also revealed that using data averaged over longer time frames (i.e., coarser data granularity) yielded high multicollinearity, as indexed by variance inflation factor values (VIF) for all model predictors. This led to higher variance in parameter estimates (i.e., parameter instability) and potential challenges in model interpretation as the sign of parameter estimates changed in many streams examined. Multicollinearity was not the only reason for these changes in the sign of parameter estimates as they were also observed in simple linear regression models across varying levels of data granularity. Based on our findings, we conclude that the selection of data granularity is an important consideration in multiple regression modeling, with profound implications for model interpretability.
期刊介绍:
The Journal of Hydrology publishes original research papers and comprehensive reviews in all the subfields of the hydrological sciences including water based management and policy issues that impact on economics and society. These comprise, but are not limited to the physical, chemical, biogeochemical, stochastic and systems aspects of surface and groundwater hydrology, hydrometeorology and hydrogeology. Relevant topics incorporating the insights and methodologies of disciplines such as climatology, water resource systems, hydraulics, agrohydrology, geomorphology, soil science, instrumentation and remote sensing, civil and environmental engineering are included. Social science perspectives on hydrological problems such as resource and ecological economics, environmental sociology, psychology and behavioural science, management and policy analysis are also invited. Multi-and interdisciplinary analyses of hydrological problems are within scope. The science published in the Journal of Hydrology is relevant to catchment scales rather than exclusively to a local scale or site.