{"title":"Literature-based explainable machine learning models for predicting pathogen and antibiotic resistance gene loads from animal manure","authors":"Ayşe Birsen Kadıoğlu Gökalp , Handan Atalay Eroğlu , Elif Nihan Kadıoğlu","doi":"10.1016/j.mran.2025.100355","DOIUrl":null,"url":null,"abstract":"<div><div>The use of animal manure (cattle, pigs, poultry, and sheep) in agriculture offers significant advantages such as increasing soil fertility and reducing the use of chemical fertilizers. However, this application also brings about serious environmental and public health problems due to the risk of microbial contaminants such as pathogenic microorganisms and antibiotic resistance genes (ARGs) spreading into the environment. In order to assess this dual risk, we developed a machine learning (ML) framework capable of simultaneously predicting pathogen load and ARG levels. The dataset contains 223 records systematically collected from 54 scientific studies published between 2015 and 2024. Six regression models were compared; Gradient Boosting algorithm (R<sup>2</sup> = 0.93) for pathogen load and Ridge Regression algorithm (R<sup>2</sup> = 0.84) for ARG level showed the highest accuracy performance. Model generalizability was tested with 5- and 10-fold cross-validation; low overfitting risk was confirmed by learning curves and residual analysis, specifically for the final selected models (Gradient Boosting for pathogen load and Ridge Regression for ARG level), while other models such as Decision Tree showed clear signs of overfitting and were therefore excluded from further analysis. The transparency of model decisions was examined with SHapley Additive exPlanations (SHAP) analyses; “application period”, “ARG type” and “fertilizer type” were highlighted as determining variables. In addition, Partial Dependence Plot (PDP) analyses revealed the marginal effects of environmental and operational factors on target variables in a biologically meaningful way. This integrated modelling approach contributes to the optimization of sustainable fertilization strategies and the development of environmental-health policies.</div></div>","PeriodicalId":48593,"journal":{"name":"Microbial Risk Analysis","volume":"30 ","pages":"Article 100355"},"PeriodicalIF":4.0000,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microbial Risk Analysis","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352352225000155","RegionNum":4,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
The use of animal manure (cattle, pigs, poultry, and sheep) in agriculture offers significant advantages such as increasing soil fertility and reducing the use of chemical fertilizers. However, this application also brings about serious environmental and public health problems due to the risk of microbial contaminants such as pathogenic microorganisms and antibiotic resistance genes (ARGs) spreading into the environment. In order to assess this dual risk, we developed a machine learning (ML) framework capable of simultaneously predicting pathogen load and ARG levels. The dataset contains 223 records systematically collected from 54 scientific studies published between 2015 and 2024. Six regression models were compared; Gradient Boosting algorithm (R2 = 0.93) for pathogen load and Ridge Regression algorithm (R2 = 0.84) for ARG level showed the highest accuracy performance. Model generalizability was tested with 5- and 10-fold cross-validation; low overfitting risk was confirmed by learning curves and residual analysis, specifically for the final selected models (Gradient Boosting for pathogen load and Ridge Regression for ARG level), while other models such as Decision Tree showed clear signs of overfitting and were therefore excluded from further analysis. The transparency of model decisions was examined with SHapley Additive exPlanations (SHAP) analyses; “application period”, “ARG type” and “fertilizer type” were highlighted as determining variables. In addition, Partial Dependence Plot (PDP) analyses revealed the marginal effects of environmental and operational factors on target variables in a biologically meaningful way. This integrated modelling approach contributes to the optimization of sustainable fertilization strategies and the development of environmental-health policies.
期刊介绍:
The journal Microbial Risk Analysis accepts articles dealing with the study of risk analysis applied to microbial hazards. Manuscripts should at least cover any of the components of risk assessment (risk characterization, exposure assessment, etc.), risk management and/or risk communication in any microbiology field (clinical, environmental, food, veterinary, etc.). This journal also accepts article dealing with predictive microbiology, quantitative microbial ecology, mathematical modeling, risk studies applied to microbial ecology, quantitative microbiology for epidemiological studies, statistical methods applied to microbiology, and laws and regulatory policies aimed at lessening the risk of microbial hazards. Work focusing on risk studies of viruses, parasites, microbial toxins, antimicrobial resistant organisms, genetically modified organisms (GMOs), and recombinant DNA products are also acceptable.