Explaining the impact of parameter combinations in agent-based models

IF 3.1 3区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Computational Science Pub Date : 2024-06-21 DOI:10.1016/j.jocs.2024.102342

Megan Olsen , D. Richard Kuhn , M.S. Raunak

{"title":"Explaining the impact of parameter combinations in agent-based models","authors":"Megan Olsen , D. Richard Kuhn , M.S. Raunak","doi":"10.1016/j.jocs.2024.102342","DOIUrl":null,"url":null,"abstract":"<div><p>Simulation is a useful and effective way to analyze and study complex, real-world systems, allowing researchers, practitioners, and decision makers to make sense of the inner working of a system involving many factors, often resulting in some sort of emergent behavior. The number of parameter value combinations grows exponentially and it quickly becomes infeasible to test them all or even to explore a suitable subset. How does one then efficiently identify the parameter value combinations that matter for a particular simulation study, and determine their impact on the result? In addition, is it possible to train a machine learning model to predict the outcome of an agent-based model (ABM) with a systematically chosen small subset of parameter value combinations, such that the result could be predicted without running the ABM? We use covering arrays to create <span><math><mi>t</mi></math></span>-way (<span><math><mi>t</mi></math></span> = 2, 3, etc.) combinations of parameter values to significantly reduce an ABM’s parameter value exploration space, which is supported by our prior work. In our ICCS 2023 paper (Olsen et al., 2023) we built on that work by applying it to Wilensky’s Heatbugs model and training a random forest machine learning model to predict simulation results by using the covering arrays to select our training and test data. Our results show that a 2-way covering array provides sufficient training data to train our random forest to predict three different simulation outcomes. Our process of using covering arrays to decrease parameter space to then predict ABM results using machine learning is successful. In this paper that extends the ICCS 2023 paper (Olsen et al., 2023), we analyze the role of parameter combinations and parameter values in determining model output via combination frequency difference (CFD) analysis and Shapley values. CFD has not previously been applied to agent-based models; we provide a process for using this approach and compare and contrast with Shapley values and random forest feature importance.</p></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"81 ","pages":"Article 102342"},"PeriodicalIF":3.1000,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Science","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1877750324001352","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Simulation is a useful and effective way to analyze and study complex, real-world systems, allowing researchers, practitioners, and decision makers to make sense of the inner working of a system involving many factors, often resulting in some sort of emergent behavior. The number of parameter value combinations grows exponentially and it quickly becomes infeasible to test them all or even to explore a suitable subset. How does one then efficiently identify the parameter value combinations that matter for a particular simulation study, and determine their impact on the result? In addition, is it possible to train a machine learning model to predict the outcome of an agent-based model (ABM) with a systematically chosen small subset of parameter value combinations, such that the result could be predicted without running the ABM? We use covering arrays to create $t$ -way ( $t$ = 2, 3, etc.) combinations of parameter values to significantly reduce an ABM’s parameter value exploration space, which is supported by our prior work. In our ICCS 2023 paper (Olsen et al., 2023) we built on that work by applying it to Wilensky’s Heatbugs model and training a random forest machine learning model to predict simulation results by using the covering arrays to select our training and test data. Our results show that a 2-way covering array provides sufficient training data to train our random forest to predict three different simulation outcomes. Our process of using covering arrays to decrease parameter space to then predict ABM results using machine learning is successful. In this paper that extends the ICCS 2023 paper (Olsen et al., 2023), we analyze the role of parameter combinations and parameter values in determining model output via combination frequency difference (CFD) analysis and Shapley values. CFD has not previously been applied to agent-based models; we provide a process for using this approach and compare and contrast with Shapley values and random forest feature importance.

查看原文本刊更多论文

解释基于代理的模型中参数组合的影响

模拟是分析和研究现实世界中复杂系统的一种有用而有效的方法，它使研究人员、从业人员和决策者能够了解涉及多种因素的系统的内部运作，通常会产生某种突发行为。参数值组合的数量呈指数级增长，要对它们全部进行测试，甚至探索一个合适的子集，很快就变得不可行。那么，如何有效地确定对特定模拟研究至关重要的参数值组合，并确定它们对结果的影响？此外，是否有可能训练一个机器学习模型，用系统选择的一小部分参数值组合来预测基于代理的模型（ABM）的结果，从而在不运行 ABM 的情况下预测结果？我们使用覆盖数组创建 t 路（t = 2、3 等）参数值组合，以显著缩小 ABM 的参数值探索空间，这一点得到了我们之前工作的支持。在我们的 ICCS 2023 论文（Olsen 等人，2023 年）中，我们以这项工作为基础，将其应用于 Wilensky 的 Heatbugs 模型，并训练了一个随机森林机器学习模型，通过使用覆盖阵列选择训练和测试数据来预测模拟结果。我们的结果表明，双向覆盖阵列提供了足够的训练数据，可以训练我们的随机森林预测三种不同的模拟结果。我们利用覆盖阵列缩小参数空间，然后利用机器学习预测 ABM 结果的过程是成功的。本文是对 ICCS 2023 论文（Olsen 等人，2023 年）的扩展，我们通过组合频率差（CFD）分析和 Shapley 值分析了参数组合和参数值在决定模型输出中的作用。CFD 以前从未应用于基于代理的模型；我们提供了使用这种方法的流程，并与 Shapley 值和随机森林特征重要性进行了比较和对比。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Computational Science COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-COMPUTER SCIENCE, THEORY & METHODS

CiteScore

5.50

自引率

3.00%

发文量

227

审稿时长

41 days

期刊介绍： Computational Science is a rapidly growing multi- and interdisciplinary field that uses advanced computing and data analysis to understand and solve complex problems. It has reached a level of predictive capability that now firmly complements the traditional pillars of experimentation and theory. The recent advances in experimental techniques such as detectors, on-line sensor networks and high-resolution imaging techniques, have opened up new windows into physical and biological processes at many levels of detail. The resulting data explosion allows for detailed data driven modeling and simulation. This new discipline in science combines computational thinking, modern computational methods, devices and collateral technologies to address problems far beyond the scope of traditional numerical methods. Computational science typically unifies three distinct elements: • Modeling, Algorithms and Simulations (e.g. numerical and non-numerical, discrete and continuous); • Software developed to solve science (e.g., biological, physical, and social), engineering, medicine, and humanities problems; • Computer and information science that develops and optimizes the advanced system hardware, software, networking, and data management components (e.g. problem solving environments).