{"title":"Application of Sediment Fingerprinting to Apportion Sediment Sources: Using Machine Learning Models","authors":"Kritika Malhotra, Jingyi Zheng, Ash Abebe, Jasmeet Lamba","doi":"10.13031/ja.14906","DOIUrl":null,"url":null,"abstract":"Highlights Relative source contributions to stream bed sediment from construction sites and stream banks were quantified. Two machine-learning techniques were used to select composite fingerprinting properties. The MixSIR Bayesian model was employed for source apportionment. Statistical methods employed for fingerprinting properties selection have the potential to impact source apportionments. Management strategies to reduce sediment mobilization should be targeted depending on the dominant source of sediment in each sub-watershed. Abstract. Sediment fingerprinting is an extensively used approach for investigating sediment sources by linking in-stream sediment mixtures with watershed source materials. The overall goal of this research was to estimate the relative source contributions of stream banks and construction sites to the stream bed sediment in an urbanized watershed (Alabama, USA) using a fingerprinting technique established on composite fingerprints selected by two different machine learning techniques at a sub-watershed scale. The two statistical approaches employed to select the subset of fingerprinting properties were: (1) the Random Forest algorithm (RF) with Gini importance ranking of variables; and (2) logistic regression with the least absolute shrinkage and selection operator (LASSO). A Bayesian mixing model was then used to estimate the distribution of mixing proportions along with the associated uncertainty. The models were built based on the composite fingerprints selected using the two machine learning methods. Overall, using the subset of fingerprints selected by RF and LASSO, the relative contribution of stream banks ranged from 14±9% to 97±2% and from 24±18% to 94±5%, respectively, throughout the watershed. The stream bank contributions were compared with a previous study conducted in the watershed that utilized a two-step statistical procedure (which involved a Mann-Whitney U-test as the first step and discriminant function analysis (DFA) as the second step) to select the composite of fingerprinting properties and a frequentist mixing model to calculate the source apportionments. The relative contributions of stream banks to stream bed sediment in the previous study reported ranged from 9±8% to 100±1%. Therefore, the study demonstrated the dependence of source attributions on the statistical procedures used to select the optimum composite fingerprints for sediment fingerprinting applications. Furthermore, the results underscored the importance of using different mixing model structures to obtain reliable estimates of source contributions. Keywords: Least absolute shrinkage and selection operator (LASSO), MixSIR Bayesian model, Random Forest (RF), Statistical techniques.","PeriodicalId":29714,"journal":{"name":"Journal of the ASABE","volume":null,"pages":null},"PeriodicalIF":1.2000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the ASABE","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.13031/ja.14906","RegionNum":4,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AGRICULTURAL ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Highlights Relative source contributions to stream bed sediment from construction sites and stream banks were quantified. Two machine-learning techniques were used to select composite fingerprinting properties. The MixSIR Bayesian model was employed for source apportionment. Statistical methods employed for fingerprinting properties selection have the potential to impact source apportionments. Management strategies to reduce sediment mobilization should be targeted depending on the dominant source of sediment in each sub-watershed. Abstract. Sediment fingerprinting is an extensively used approach for investigating sediment sources by linking in-stream sediment mixtures with watershed source materials. The overall goal of this research was to estimate the relative source contributions of stream banks and construction sites to the stream bed sediment in an urbanized watershed (Alabama, USA) using a fingerprinting technique established on composite fingerprints selected by two different machine learning techniques at a sub-watershed scale. The two statistical approaches employed to select the subset of fingerprinting properties were: (1) the Random Forest algorithm (RF) with Gini importance ranking of variables; and (2) logistic regression with the least absolute shrinkage and selection operator (LASSO). A Bayesian mixing model was then used to estimate the distribution of mixing proportions along with the associated uncertainty. The models were built based on the composite fingerprints selected using the two machine learning methods. Overall, using the subset of fingerprints selected by RF and LASSO, the relative contribution of stream banks ranged from 14±9% to 97±2% and from 24±18% to 94±5%, respectively, throughout the watershed. The stream bank contributions were compared with a previous study conducted in the watershed that utilized a two-step statistical procedure (which involved a Mann-Whitney U-test as the first step and discriminant function analysis (DFA) as the second step) to select the composite of fingerprinting properties and a frequentist mixing model to calculate the source apportionments. The relative contributions of stream banks to stream bed sediment in the previous study reported ranged from 9±8% to 100±1%. Therefore, the study demonstrated the dependence of source attributions on the statistical procedures used to select the optimum composite fingerprints for sediment fingerprinting applications. Furthermore, the results underscored the importance of using different mixing model structures to obtain reliable estimates of source contributions. Keywords: Least absolute shrinkage and selection operator (LASSO), MixSIR Bayesian model, Random Forest (RF), Statistical techniques.