{"title":"Explaining the impact of parameter combinations in agent-based models","authors":"Megan Olsen , D. Richard Kuhn , M.S. Raunak","doi":"10.1016/j.jocs.2024.102342","DOIUrl":null,"url":null,"abstract":"<div><p>Simulation is a useful and effective way to analyze and study complex, real-world systems, allowing researchers, practitioners, and decision makers to make sense of the inner working of a system involving many factors, often resulting in some sort of emergent behavior. The number of parameter value combinations grows exponentially and it quickly becomes infeasible to test them all or even to explore a suitable subset. How does one then efficiently identify the parameter value combinations that matter for a particular simulation study, and determine their impact on the result? In addition, is it possible to train a machine learning model to predict the outcome of an agent-based model (ABM) with a systematically chosen small subset of parameter value combinations, such that the result could be predicted without running the ABM? We use covering arrays to create <span><math><mi>t</mi></math></span>-way (<span><math><mi>t</mi></math></span> = 2, 3, etc.) combinations of parameter values to significantly reduce an ABM’s parameter value exploration space, which is supported by our prior work. In our ICCS 2023 paper (Olsen et al., 2023) we built on that work by applying it to Wilensky’s Heatbugs model and training a random forest machine learning model to predict simulation results by using the covering arrays to select our training and test data. Our results show that a 2-way covering array provides sufficient training data to train our random forest to predict three different simulation outcomes. Our process of using covering arrays to decrease parameter space to then predict ABM results using machine learning is successful. In this paper that extends the ICCS 2023 paper (Olsen et al., 2023), we analyze the role of parameter combinations and parameter values in determining model output via combination frequency difference (CFD) analysis and Shapley values. CFD has not previously been applied to agent-based models; we provide a process for using this approach and compare and contrast with Shapley values and random forest feature importance.</p></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"81 ","pages":"Article 102342"},"PeriodicalIF":3.1000,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Science","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1877750324001352","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Simulation is a useful and effective way to analyze and study complex, real-world systems, allowing researchers, practitioners, and decision makers to make sense of the inner working of a system involving many factors, often resulting in some sort of emergent behavior. The number of parameter value combinations grows exponentially and it quickly becomes infeasible to test them all or even to explore a suitable subset. How does one then efficiently identify the parameter value combinations that matter for a particular simulation study, and determine their impact on the result? In addition, is it possible to train a machine learning model to predict the outcome of an agent-based model (ABM) with a systematically chosen small subset of parameter value combinations, such that the result could be predicted without running the ABM? We use covering arrays to create -way ( = 2, 3, etc.) combinations of parameter values to significantly reduce an ABM’s parameter value exploration space, which is supported by our prior work. In our ICCS 2023 paper (Olsen et al., 2023) we built on that work by applying it to Wilensky’s Heatbugs model and training a random forest machine learning model to predict simulation results by using the covering arrays to select our training and test data. Our results show that a 2-way covering array provides sufficient training data to train our random forest to predict three different simulation outcomes. Our process of using covering arrays to decrease parameter space to then predict ABM results using machine learning is successful. In this paper that extends the ICCS 2023 paper (Olsen et al., 2023), we analyze the role of parameter combinations and parameter values in determining model output via combination frequency difference (CFD) analysis and Shapley values. CFD has not previously been applied to agent-based models; we provide a process for using this approach and compare and contrast with Shapley values and random forest feature importance.
期刊介绍:
Computational Science is a rapidly growing multi- and interdisciplinary field that uses advanced computing and data analysis to understand and solve complex problems. It has reached a level of predictive capability that now firmly complements the traditional pillars of experimentation and theory.
The recent advances in experimental techniques such as detectors, on-line sensor networks and high-resolution imaging techniques, have opened up new windows into physical and biological processes at many levels of detail. The resulting data explosion allows for detailed data driven modeling and simulation.
This new discipline in science combines computational thinking, modern computational methods, devices and collateral technologies to address problems far beyond the scope of traditional numerical methods.
Computational science typically unifies three distinct elements:
• Modeling, Algorithms and Simulations (e.g. numerical and non-numerical, discrete and continuous);
• Software developed to solve science (e.g., biological, physical, and social), engineering, medicine, and humanities problems;
• Computer and information science that develops and optimizes the advanced system hardware, software, networking, and data management components (e.g. problem solving environments).