通过顺序学习来包含传播:利用还是探索?

Trans. Mach. Learn. Res. Pub Date : 2023-03-01 DOI:10.48550/arXiv.2303.00141

Xingran Chen, Hesam Nikpey, Jungyeol Kim, S. Sarkar, S. S. Bidokhti

{"title":"通过顺序学习来包含传播:利用还是探索?","authors":"Xingran Chen, Hesam Nikpey, Jungyeol Kim, S. Sarkar, S. S. Bidokhti","doi":"10.48550/arXiv.2303.00141","DOIUrl":null,"url":null,"abstract":"The spread of an undesirable contact process, such as an infectious disease (e.g. COVID-19), is contained through testing and isolation of infected nodes. The temporal and spatial evolution of the process (along with containment through isolation) render such detection as fundamentally different from active search detection strategies. In this work, through an active learning approach, we design testing and isolation strategies to contain the spread and minimize the cumulative infections under a given test budget. We prove that the objective can be optimized, with performance guarantees, by greedily selecting the nodes to test. We further design reward-based methodologies that effectively minimize an upper bound on the cumulative infections and are computationally more tractable in large networks. These policies, however, need knowledge about the nodes' infection probabilities which are dynamically changing and have to be learned by sequential testing. We develop a message-passing framework for this purpose and, building on that, show novel tradeoffs between exploitation of knowledge through reward-based heuristics and exploration of the unknown through a carefully designed probabilistic testing. The tradeoffs are fundamentally distinct from the classical counterparts under active search or multi-armed bandit problems (MABs). We provably show the necessity of exploration in a stylized network and show through simulations that exploration can outperform exploitation in various synthetic and real-data networks depending on the parameters of the network and the spread.","PeriodicalId":432739,"journal":{"name":"Trans. Mach. Learn. Res.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Containing a spread through sequential learning: to exploit or to explore?\",\"authors\":\"Xingran Chen, Hesam Nikpey, Jungyeol Kim, S. Sarkar, S. S. Bidokhti\",\"doi\":\"10.48550/arXiv.2303.00141\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The spread of an undesirable contact process, such as an infectious disease (e.g. COVID-19), is contained through testing and isolation of infected nodes. The temporal and spatial evolution of the process (along with containment through isolation) render such detection as fundamentally different from active search detection strategies. In this work, through an active learning approach, we design testing and isolation strategies to contain the spread and minimize the cumulative infections under a given test budget. We prove that the objective can be optimized, with performance guarantees, by greedily selecting the nodes to test. We further design reward-based methodologies that effectively minimize an upper bound on the cumulative infections and are computationally more tractable in large networks. These policies, however, need knowledge about the nodes' infection probabilities which are dynamically changing and have to be learned by sequential testing. We develop a message-passing framework for this purpose and, building on that, show novel tradeoffs between exploitation of knowledge through reward-based heuristics and exploration of the unknown through a carefully designed probabilistic testing. The tradeoffs are fundamentally distinct from the classical counterparts under active search or multi-armed bandit problems (MABs). We provably show the necessity of exploration in a stylized network and show through simulations that exploration can outperform exploitation in various synthetic and real-data networks depending on the parameters of the network and the spread.\",\"PeriodicalId\":432739,\"journal\":{\"name\":\"Trans. Mach. Learn. Res.\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Trans. Mach. Learn. Res.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2303.00141\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Trans. Mach. Learn. Res.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2303.00141","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

不希望的接触过程的传播，如传染病(如COVID-19)，可通过检测和隔离受感染节点来控制。这一过程的时间和空间演变(以及通过隔离的遏制)使这种检测与主动搜索检测策略有着根本的不同。在这项工作中，通过主动学习方法，我们设计了测试和隔离策略，以在给定的测试预算下控制传播并最大限度地减少累积感染。我们证明，通过贪婪地选择要测试的节点，可以在性能保证的情况下优化目标。我们进一步设计了基于奖励的方法，有效地最小化了累积感染的上限，并且在大型网络中计算上更易于处理。然而，这些策略需要了解节点的感染概率，这些概率是动态变化的，必须通过顺序测试来学习。我们为此目的开发了一个消息传递框架，并在此基础上展示了通过基于奖励的启发式方法利用知识和通过精心设计的概率测试探索未知之间的新权衡。这种权衡从根本上不同于主动搜索或多武装土匪问题(mab)下的经典对应。我们证明了在程式化网络中进行勘探的必要性，并通过仿真表明，根据网络的参数和分布，在各种合成网络和真实数据网络中，勘探可以优于勘探。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Containing a spread through sequential learning: to exploit or to explore?

The spread of an undesirable contact process, such as an infectious disease (e.g. COVID-19), is contained through testing and isolation of infected nodes. The temporal and spatial evolution of the process (along with containment through isolation) render such detection as fundamentally different from active search detection strategies. In this work, through an active learning approach, we design testing and isolation strategies to contain the spread and minimize the cumulative infections under a given test budget. We prove that the objective can be optimized, with performance guarantees, by greedily selecting the nodes to test. We further design reward-based methodologies that effectively minimize an upper bound on the cumulative infections and are computationally more tractable in large networks. These policies, however, need knowledge about the nodes' infection probabilities which are dynamically changing and have to be learned by sequential testing. We develop a message-passing framework for this purpose and, building on that, show novel tradeoffs between exploitation of knowledge through reward-based heuristics and exploration of the unknown through a carefully designed probabilistic testing. The tradeoffs are fundamentally distinct from the classical counterparts under active search or multi-armed bandit problems (MABs). We provably show the necessity of exploration in a stylized network and show through simulations that exploration can outperform exploitation in various synthetic and real-data networks depending on the parameters of the network and the spread.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Trans. Mach. Learn. Res.

自引率

0.00%

发文量