跳来跳去选择马尔可夫链蒙特卡洛参数和诊断方法，提高食物网模型质量和生态系统代表性

IF 5.8 2区环境科学与生态学 Q1 ECOLOGY

Ecological Informatics Pub Date : 2024-10-24 DOI:10.1016/j.ecoinf.2024.102865

Gemma Gerber , Ursula M. Scharler

{"title":"跳来跳去选择马尔可夫链蒙特卡洛参数和诊断方法，提高食物网模型质量和生态系统代表性","authors":"Gemma Gerber , Ursula M. Scharler","doi":"10.1016/j.ecoinf.2024.102865","DOIUrl":null,"url":null,"abstract":"<div><div>Capturing ecological data variability in food web models is an important step for improving model representation of empirical systems. One approach is to use linear inverse modelling and Markov Chain Monte Carlo (LIM-MCMC) techniques to set up an inverse LIM problem using empirical data constraints, and then sample multiple plausible food webs from the inverse problem using an MCMC algorithm. We describe the set of plausible food webs as an ‘ensemble’ of solutions to the inverse problem sampled with the LIM-MCMC algorithm. The extent of data variability eventually integrated into an ensemble depends on how well the LIM-MCMC algorithm samples the solution space. Algorithm quality can be adjusted via user-defined parameters describing starting points, jump sizes, and number of iterations or food webs produced. However, little information exists on how each LIM-MCMC algorithm parameter affects the degree of empirical data variability introduced into the ensemble. Further, post hoc algorithm quality diagnostics with commonly used trace plots and the coefficient of variation (CoV) rarely address critical aspects of algorithm quality, such as (1) if the returned ensemble successfully targeted the solution space distribution (stationarity), (2) correlation between ensemble solutions (mixing), and (3) if the ensemble contains enough solutions to adequately capture input data variability (sampling efficiency). Therefore, we used several established MCMC convergence diagnostics to (1) quantify how algorithm parameters affect ensemble flow values and if these differences propagate to ecological indicators and (2) evaluate algorithm quality and compare to current evaluation and ecosystem modelling methods. We applied 30 LIM-MCMC algorithm combinations of varying starting points, jump sizes, and number of iterations to solve food web ensembles from a single food web model. We analysed ensembles with Ecological Network Analysis (ENA) to calculate indicators describing system function. Results show that LIM-MCMC algorithm parameters, in particular the jump size, affect ensemble flow values, which propagate to ecological indicators describing different ecosystem function of the same model. Thereafter, comparisons of post hoc diagnostics show that MCMC convergence diagnostics provided more robust estimates of algorithm quality than trace plots and CoV. Together, these findings underpin several novel recommendations to enhance LIM-MCMC algorithm parameter selection and quality assessments applicable to any ecological ensemble network study.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"84 ","pages":"Article 102865"},"PeriodicalIF":5.8000,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Jump around: Selecting Markov Chain Monte Carlo parameters and diagnostics for improved food web model quality and ecosystem representation\",\"authors\":\"Gemma Gerber , Ursula M. Scharler\",\"doi\":\"10.1016/j.ecoinf.2024.102865\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Capturing ecological data variability in food web models is an important step for improving model representation of empirical systems. One approach is to use linear inverse modelling and Markov Chain Monte Carlo (LIM-MCMC) techniques to set up an inverse LIM problem using empirical data constraints, and then sample multiple plausible food webs from the inverse problem using an MCMC algorithm. We describe the set of plausible food webs as an ‘ensemble’ of solutions to the inverse problem sampled with the LIM-MCMC algorithm. The extent of data variability eventually integrated into an ensemble depends on how well the LIM-MCMC algorithm samples the solution space. Algorithm quality can be adjusted via user-defined parameters describing starting points, jump sizes, and number of iterations or food webs produced. However, little information exists on how each LIM-MCMC algorithm parameter affects the degree of empirical data variability introduced into the ensemble. Further, post hoc algorithm quality diagnostics with commonly used trace plots and the coefficient of variation (CoV) rarely address critical aspects of algorithm quality, such as (1) if the returned ensemble successfully targeted the solution space distribution (stationarity), (2) correlation between ensemble solutions (mixing), and (3) if the ensemble contains enough solutions to adequately capture input data variability (sampling efficiency). Therefore, we used several established MCMC convergence diagnostics to (1) quantify how algorithm parameters affect ensemble flow values and if these differences propagate to ecological indicators and (2) evaluate algorithm quality and compare to current evaluation and ecosystem modelling methods. We applied 30 LIM-MCMC algorithm combinations of varying starting points, jump sizes, and number of iterations to solve food web ensembles from a single food web model. We analysed ensembles with Ecological Network Analysis (ENA) to calculate indicators describing system function. Results show that LIM-MCMC algorithm parameters, in particular the jump size, affect ensemble flow values, which propagate to ecological indicators describing different ecosystem function of the same model. Thereafter, comparisons of post hoc diagnostics show that MCMC convergence diagnostics provided more robust estimates of algorithm quality than trace plots and CoV. Together, these findings underpin several novel recommendations to enhance LIM-MCMC algorithm parameter selection and quality assessments applicable to any ecological ensemble network study.</div></div>\",\"PeriodicalId\":51024,\"journal\":{\"name\":\"Ecological Informatics\",\"volume\":\"84 \",\"pages\":\"Article 102865\"},\"PeriodicalIF\":5.8000,\"publicationDate\":\"2024-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ecological Informatics\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1574954124004072\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574954124004072","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

在食物网模型中捕捉生态数据的变异性，是改进模型对经验系统表征的重要一步。一种方法是使用线性反演建模和马尔可夫链蒙特卡罗（LIM-MCMC）技术，利用经验数据约束条件设置反演 LIM 问题，然后使用 MCMC 算法从反演问题中抽取多个可信食物网样本。我们将一组可信的食物网描述为使用 LIM-MCMC 算法对逆问题进行采样的解的 "集合"。最终纳入集合的数据变化程度取决于 LIM-MCMC 算法对解空间的采样效果。算法质量可通过用户定义的参数进行调整，这些参数包括起点、跳跃大小、迭代次数或产生的食物网。然而，关于 LIM-MCMC 算法的每个参数如何影响集合中引入的经验数据变异程度的信息却很少。此外，使用常用的轨迹图和变异系数（CoV）进行事后算法质量诊断，很少能解决算法质量的关键问题，如：（1）返回的集合是否成功地针对解空间分布（静止性）；（2）集合解之间的相关性（混合性）；（3）集合是否包含足够的解，以充分捕捉输入数据的变异性（采样效率）。因此，我们使用了几种成熟的 MCMC 收敛诊断方法，以 (1) 量化算法参数如何影响集合流量值，以及这些差异是否会传播到生态指标；(2) 评估算法质量，并与当前的评估和生态系统建模方法进行比较。我们应用了 30 种不同起点、跳跃大小和迭代次数的 LIM-MCMC 算法组合，以求解来自单一食物网模型的食物网集合。我们用生态网络分析（ENA）对集合进行了分析，以计算描述系统功能的指标。结果表明，LIM-MCMC 算法参数，尤其是跳跃大小，会影响集合流值，而集合流值又会传播到描述同一模型不同生态系统功能的生态指标。此后，事后诊断比较显示，MCMC 收敛诊断比迹图和 CoV 对算法质量提供了更可靠的估计。总之，这些发现为加强 LIM-MCMC 算法参数选择和适用于任何生态集合网络研究的质量评估提出了多项新建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Jump around: Selecting Markov Chain Monte Carlo parameters and diagnostics for improved food web model quality and ecosystem representation

Capturing ecological data variability in food web models is an important step for improving model representation of empirical systems. One approach is to use linear inverse modelling and Markov Chain Monte Carlo (LIM-MCMC) techniques to set up an inverse LIM problem using empirical data constraints, and then sample multiple plausible food webs from the inverse problem using an MCMC algorithm. We describe the set of plausible food webs as an ‘ensemble’ of solutions to the inverse problem sampled with the LIM-MCMC algorithm. The extent of data variability eventually integrated into an ensemble depends on how well the LIM-MCMC algorithm samples the solution space. Algorithm quality can be adjusted via user-defined parameters describing starting points, jump sizes, and number of iterations or food webs produced. However, little information exists on how each LIM-MCMC algorithm parameter affects the degree of empirical data variability introduced into the ensemble. Further, post hoc algorithm quality diagnostics with commonly used trace plots and the coefficient of variation (CoV) rarely address critical aspects of algorithm quality, such as (1) if the returned ensemble successfully targeted the solution space distribution (stationarity), (2) correlation between ensemble solutions (mixing), and (3) if the ensemble contains enough solutions to adequately capture input data variability (sampling efficiency). Therefore, we used several established MCMC convergence diagnostics to (1) quantify how algorithm parameters affect ensemble flow values and if these differences propagate to ecological indicators and (2) evaluate algorithm quality and compare to current evaluation and ecosystem modelling methods. We applied 30 LIM-MCMC algorithm combinations of varying starting points, jump sizes, and number of iterations to solve food web ensembles from a single food web model. We analysed ensembles with Ecological Network Analysis (ENA) to calculate indicators describing system function. Results show that LIM-MCMC algorithm parameters, in particular the jump size, affect ensemble flow values, which propagate to ecological indicators describing different ecosystem function of the same model. Thereafter, comparisons of post hoc diagnostics show that MCMC convergence diagnostics provided more robust estimates of algorithm quality than trace plots and CoV. Together, these findings underpin several novel recommendations to enhance LIM-MCMC algorithm parameter selection and quality assessments applicable to any ecological ensemble network study.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Ecological Informatics 环境科学-生态学

CiteScore

8.30

自引率

11.80%

发文量

346

审稿时长

46 days

期刊介绍： The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change. The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.