Factor Analysis and Prediction of Disease Risk Based on Large Ensembles of Models: Application to Virus Yellows in Sugar Beet.

IF 3.1 2区农林科学 Q2 PLANT SCIENCES

Phytopathology Pub Date : 2025-10-01 DOI:10.1094/PHYTO-01-25-0014-FI

D Chauvin, E Gabriel, D Martinetti, J Papaïx, C Martinez, G Geniaux, F Joudelat, S Soubeyrand

{"title":"Factor Analysis and Prediction of Disease Risk Based on Large Ensembles of Models: Application to Virus Yellows in Sugar Beet.","authors":"D Chauvin, E Gabriel, D Martinetti, J Papaïx, C Martinez, G Geniaux, F Joudelat, S Soubeyrand","doi":"10.1094/PHYTO-01-25-0014-FI","DOIUrl":null,"url":null,"abstract":"<p><p>Identifying disease risk factors, characterizing their effects, and forecasting disease risk across space and time are crucial tasks in human, animal, and plant epidemiology. Statistical and machine learning models have largely superseded purely descriptive analyses of data in handling these tasks. In addition, these models have demonstrated their full potential in the current era, characterized by an unprecedented abundance of data. However, applying these models to real-world, large-scale data sets raises critical questions: Which model should be used? Which explanatory variables should be selected? What data should be allocated for training and validation? The answers to these questions often have a significant impact on the analysis outcomes. One way to address some of these challenges is to analyze risk factors and predict risk by using an ensemble of models rather than relying on a single model. This approach is developed in this article and implemented in the case of virus yellows in sugar beet in France. Among the explanatory variables correlated with the severity of virus yellows, we identified winter and spring temperatures (positive correlation), spring humidity and precipitation (negative correlation), the proportion of cereal crops (positive correlation), the proportion of grasslands (negative correlation), and the distance to sugar beet seed production fields (negative correlation). Additionally, we found that predictions are generally more robust when using a spatial aggregation of models compared with relying on the best individual model. Our approach is highly versatile and can be applied to characterize and predict the spatiotemporal distributions of diverse diseases.</p>","PeriodicalId":20410,"journal":{"name":"Phytopathology","volume":" ","pages":"PHYTO01250014FI"},"PeriodicalIF":3.1000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Phytopathology","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.1094/PHYTO-01-25-0014-FI","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PLANT SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Identifying disease risk factors, characterizing their effects, and forecasting disease risk across space and time are crucial tasks in human, animal, and plant epidemiology. Statistical and machine learning models have largely superseded purely descriptive analyses of data in handling these tasks. In addition, these models have demonstrated their full potential in the current era, characterized by an unprecedented abundance of data. However, applying these models to real-world, large-scale data sets raises critical questions: Which model should be used? Which explanatory variables should be selected? What data should be allocated for training and validation? The answers to these questions often have a significant impact on the analysis outcomes. One way to address some of these challenges is to analyze risk factors and predict risk by using an ensemble of models rather than relying on a single model. This approach is developed in this article and implemented in the case of virus yellows in sugar beet in France. Among the explanatory variables correlated with the severity of virus yellows, we identified winter and spring temperatures (positive correlation), spring humidity and precipitation (negative correlation), the proportion of cereal crops (positive correlation), the proportion of grasslands (negative correlation), and the distance to sugar beet seed production fields (negative correlation). Additionally, we found that predictions are generally more robust when using a spatial aggregation of models compared with relying on the best individual model. Our approach is highly versatile and can be applied to characterize and predict the spatiotemporal distributions of diverse diseases.

查看原文本刊更多论文

基于模型大集合的因子分析与疾病风险预测：在甜菜病毒黄病中的应用。

识别疾病风险因素，描述其影响，预测跨空间和时间的疾病风险是人类、动物和植物流行病学的关键任务。在处理这些任务时，统计和机器学习模型在很大程度上取代了纯粹的描述性数据分析。此外，在数据空前丰富的当今时代，这些模型充分展示了它们的潜力。然而，将这些模型应用于现实世界的大规模数据集会提出关键问题：应该使用哪个模型？应该选择哪些解释变量？应该分配哪些数据用于培训和验证…？这些问题的答案通常会对分析结果产生重大影响。解决这些挑战的一种方法是通过使用模型集合而不是依赖单一模型来分析风险因素和预测风险。本文开发了这种方法，并在法国甜菜病毒黄的情况下实施。在与病毒黄严重程度相关的解释变量中，我们确定了冬春气温（正相关）、春季湿度和降水（负相关）、谷类作物比例（正相关）、草原比例（负相关）和甜菜种田距离（负相关）。此外，我们发现，与依赖最佳个体模型相比，使用模型的空间聚合时，预测通常更稳健。我们的方法是高度通用的，可用于表征和预测不同疾病的时空分布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Phytopathology 生物-植物科学

CiteScore

5.90

自引率

9.40%

发文量

505

审稿时长

4-8 weeks

期刊介绍： Phytopathology publishes articles on fundamental research that advances understanding of the nature of plant diseases, the agents that cause them, their spread, the losses they cause, and measures that can be used to control them. Phytopathology considers manuscripts covering all aspects of plant diseases including bacteriology, host-parasite biochemistry and cell biology, biological control, disease control and pest management, description of new pathogen species description of new pathogen species, ecology and population biology, epidemiology, disease etiology, host genetics and resistance, mycology, nematology, plant stress and abiotic disorders, postharvest pathology and mycotoxins, and virology. Papers dealing mainly with taxonomy, such as descriptions of new plant pathogen taxa are acceptable if they include plant disease research results such as pathogenicity, host range, etc. Taxonomic papers that focus on classification, identification, and nomenclature below the subspecies level may also be submitted to Phytopathology.