Leveraging Simulation Data to Understand Bias in Predictive Models of Infectious Disease Spread

IF 1.2 Q4 REMOTE SENSING

ACM Transactions on Spatial Algorithms and Systems Pub Date : 2024-04-25 DOI:10.1145/3660631

Andreas Züfle, Flora Salim, Taylor Anderson, M. Scotch, Li Xiong, Kacper Sokol, Hao Xue, Ruochen Kong, David Heslop, Hye-Young Paik, C. R. MacIntyre

{"title":"Leveraging Simulation Data to Understand Bias in Predictive Models of Infectious Disease Spread","authors":"Andreas Züfle, Flora Salim, Taylor Anderson, M. Scotch, Li Xiong, Kacper Sokol, Hao Xue, Ruochen Kong, David Heslop, Hye-Young Paik, C. R. MacIntyre","doi":"10.1145/3660631","DOIUrl":null,"url":null,"abstract":"The spread of infectious diseases is a highly complex spatiotemporal process, difficult to understand, predict, and effectively respond to. Machine learning and artificial intelligence (AI) have achieved impressive results in other learning and prediction tasks; however, while many AI solutions are developed for disease prediction, only a few of them are adopted by decision-makers to support policy interventions. Among several issues preventing their uptake, AI methods are known to amplify the bias in the data they are trained on. This is especially problematic for infectious disease models that typically leverage large, open, and inherently biased spatiotemporal data. These biases may propagate through the modeling pipeline to decision-making, resulting in inequitable policy interventions. Therefore, there is a need to gain an understanding of how the AI disease modeling pipeline can mitigate biased input data, in-processing models, and biased outputs. Specifically, our vision is to develop a large-scale micro-simulation of individuals from which human mobility, population, and disease ground truth data can be obtained. From this complete dataset – which may not reflect the real world – we can sample and inject different types of bias. By using the sampled data in which bias is known (as it is given as the simulation parameter), we can explore how existing solutions for fairness in AI can mitigate and correct these biases and investigate novel AI fairness solutions. Achieving this vision would result in improved trust in such models for informing fair and equitable policy interventions.","PeriodicalId":43641,"journal":{"name":"ACM Transactions on Spatial Algorithms and Systems","volume":null,"pages":null},"PeriodicalIF":1.2000,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Spatial Algorithms and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3660631","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"REMOTE SENSING","Score":null,"Total":0}

引用次数: 0

Abstract

The spread of infectious diseases is a highly complex spatiotemporal process, difficult to understand, predict, and effectively respond to. Machine learning and artificial intelligence (AI) have achieved impressive results in other learning and prediction tasks; however, while many AI solutions are developed for disease prediction, only a few of them are adopted by decision-makers to support policy interventions. Among several issues preventing their uptake, AI methods are known to amplify the bias in the data they are trained on. This is especially problematic for infectious disease models that typically leverage large, open, and inherently biased spatiotemporal data. These biases may propagate through the modeling pipeline to decision-making, resulting in inequitable policy interventions. Therefore, there is a need to gain an understanding of how the AI disease modeling pipeline can mitigate biased input data, in-processing models, and biased outputs. Specifically, our vision is to develop a large-scale micro-simulation of individuals from which human mobility, population, and disease ground truth data can be obtained. From this complete dataset – which may not reflect the real world – we can sample and inject different types of bias. By using the sampled data in which bias is known (as it is given as the simulation parameter), we can explore how existing solutions for fairness in AI can mitigate and correct these biases and investigate novel AI fairness solutions. Achieving this vision would result in improved trust in such models for informing fair and equitable policy interventions.

查看原文本刊更多论文

利用模拟数据了解传染病传播预测模型的偏差

传染病的传播是一个高度复杂的时空过程，难以理解、预测和有效应对。机器学习和人工智能（AI）在其他学习和预测任务中取得了令人印象深刻的成果；然而，虽然为疾病预测开发了许多人工智能解决方案，但只有少数方案被决策者采用，以支持政策干预。众所周知，人工智能方法会放大其训练数据的偏差，这也是阻碍其被采用的几个问题之一。这对于传染病模型来说尤其成问题，因为传染病模型通常利用的是大量、开放和固有偏差的时空数据。这些偏差可能会通过建模管道传播到决策过程中，导致不公平的政策干预。因此，有必要了解人工智能疾病建模管道如何能够减少有偏差的输入数据、内处理模型和有偏差的输出。具体来说，我们的愿景是开发一种大规模的个人微观模拟，从中获取人类流动、人口和疾病的基本真实数据。从这个完整的数据集（可能无法反映真实世界）中，我们可以采样并注入不同类型的偏差。通过使用已知偏差（因为偏差已作为模拟参数给出）的采样数据，我们可以探索现有的人工智能公平性解决方案如何减轻和纠正这些偏差，并研究新的人工智能公平性解决方案。实现这一愿景将提高人们对此类模型的信任度，从而为公平公正的政策干预提供依据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Spatial Algorithms and Systems REMOTE SENSING-

CiteScore

4.40

自引率

5.30%

发文量

期刊介绍： ACM Transactions on Spatial Algorithms and Systems (TSAS) is a scholarly journal that publishes the highest quality papers on all aspects of spatial algorithms and systems and closely related disciplines. It has a multi-disciplinary perspective in that it spans a large number of areas where spatial data is manipulated or visualized (regardless of how it is specified - i.e., geometrically or textually) such as geography, geographic information systems (GIS), geospatial and spatiotemporal databases, spatial and metric indexing, location-based services, web-based spatial applications, geographic information retrieval (GIR), spatial reasoning and mining, security and privacy, as well as the related visual computing areas of computer graphics, computer vision, geometric modeling, and visualization where the spatial, geospatial, and spatiotemporal data is central.