Andrea Simoncini , Dimitri Giunchi , Marta Marcucci , Alessandro Massolo
{"title":"A framework for predicting zoonotic hosts using pseudo-absences: the case of Echinococcus multilocularis","authors":"Andrea Simoncini , Dimitri Giunchi , Marta Marcucci , Alessandro Massolo","doi":"10.1016/j.ecoinf.2025.103295","DOIUrl":null,"url":null,"abstract":"<div><div>Identifying the host range of zoonotic parasites is challenging due to limited data and sampling biases. In particular, while more information exists for susceptible hosts, data on resistant species is extremely scant. <em>Echinococcus multilocularis</em> (Leuckart, 1863) (Cestoda: Taeniidae) is the causative agent of alveolar echinococcosis, one of the most significant food-borne zoonoses worldwide. Using data on susceptibility and competence of Holarctic cricetid and murid rodents, key intermediate hosts for <em>E. multilocularis</em>, we developed models to predict the likelihood of infection for any rodent species in the Holarctic. These models incorporated morphological and ecological characteristics and employed two approaches: Generalized Linear Models (GLM) and Presence-Unlabeled Learning (PU-L), a machine learning technique. To train the models, we defined pseudo-absences based on the bias in research effort. We compared the two algorithms and selected GLM as the most effective, using it to map potentially susceptible rodent species across phylogeny and geographic space. Predictions identified several potentially unreported hosts, suggesting that the current understanding of <em>E. multilocularis</em> host distribution may underestimate the true risk. The predicted richness of intermediate hosts peaked in Central-Eastern Europe, Western North America and Central Asia, while the ratio of predicted hosts to total rodent richness was highest in the northern latitudes and the Tibetan Plateau. The average temperature in the geographic range and range size emerged as the strongest predictors of host susceptibility. The workflow demonstrates promise for application to other host-parasite systems with unknown host ranges.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"90 ","pages":"Article 103295"},"PeriodicalIF":5.8000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574954125003048","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Identifying the host range of zoonotic parasites is challenging due to limited data and sampling biases. In particular, while more information exists for susceptible hosts, data on resistant species is extremely scant. Echinococcus multilocularis (Leuckart, 1863) (Cestoda: Taeniidae) is the causative agent of alveolar echinococcosis, one of the most significant food-borne zoonoses worldwide. Using data on susceptibility and competence of Holarctic cricetid and murid rodents, key intermediate hosts for E. multilocularis, we developed models to predict the likelihood of infection for any rodent species in the Holarctic. These models incorporated morphological and ecological characteristics and employed two approaches: Generalized Linear Models (GLM) and Presence-Unlabeled Learning (PU-L), a machine learning technique. To train the models, we defined pseudo-absences based on the bias in research effort. We compared the two algorithms and selected GLM as the most effective, using it to map potentially susceptible rodent species across phylogeny and geographic space. Predictions identified several potentially unreported hosts, suggesting that the current understanding of E. multilocularis host distribution may underestimate the true risk. The predicted richness of intermediate hosts peaked in Central-Eastern Europe, Western North America and Central Asia, while the ratio of predicted hosts to total rodent richness was highest in the northern latitudes and the Tibetan Plateau. The average temperature in the geographic range and range size emerged as the strongest predictors of host susceptibility. The workflow demonstrates promise for application to other host-parasite systems with unknown host ranges.
期刊介绍:
The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change.
The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.