Policy Design for Active Sequential Hypothesis Testing using Deep Learning

2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton) Pub Date : 2018-10-01 DOI:10.1109/ALLERTON.2018.8636086

D. Kartik, Ekraam Sabir, U. Mitra, P. Natarajan

{"title":"Policy Design for Active Sequential Hypothesis Testing using Deep Learning","authors":"D. Kartik, Ekraam Sabir, U. Mitra, P. Natarajan","doi":"10.1109/ALLERTON.2018.8636086","DOIUrl":null,"url":null,"abstract":"Information theory has been very successful in obtaining performance limits for various problems such as communication, compression and hypothesis testing. Likewise, stochastic control theory provides a characterization of optimal policies for Partially Observable Markov Decision Processes (POMDPs) using dynamic programming. However, finding optimal policies for these problems is computationally hard in general and thus, heuristic solutions are employed in practice. Deep learning can be used as a tool for designing better heuristics in such problems. In this paper, the problem of active sequential hypothesis testing is considered. The goal is to design a policy that can reliably infer the true hypothesis using as few samples as possible by adaptively selecting appropriate queries. This problem can be modeled as a POMDP and bounds on its value function exist in literature. However, optimal policies have not been identified and various heuristics are used. In this paper, two new heuristics are proposed: one based on deep reinforcement learning and another based on a KL-divergence zero-sum game. These heuristics are compared with state-of-the-art solutions and it is demonstrated using numerical experiments that the proposed heuristics can achieve significantly better performance than existing methods in some scenarios.","PeriodicalId":299280,"journal":{"name":"2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ALLERTON.2018.8636086","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

Information theory has been very successful in obtaining performance limits for various problems such as communication, compression and hypothesis testing. Likewise, stochastic control theory provides a characterization of optimal policies for Partially Observable Markov Decision Processes (POMDPs) using dynamic programming. However, finding optimal policies for these problems is computationally hard in general and thus, heuristic solutions are employed in practice. Deep learning can be used as a tool for designing better heuristics in such problems. In this paper, the problem of active sequential hypothesis testing is considered. The goal is to design a policy that can reliably infer the true hypothesis using as few samples as possible by adaptively selecting appropriate queries. This problem can be modeled as a POMDP and bounds on its value function exist in literature. However, optimal policies have not been identified and various heuristics are used. In this paper, two new heuristics are proposed: one based on deep reinforcement learning and another based on a KL-divergence zero-sum game. These heuristics are compared with state-of-the-art solutions and it is demonstrated using numerical experiments that the proposed heuristics can achieve significantly better performance than existing methods in some scenarios.

查看原文本刊更多论文

基于深度学习的主动序列假设检验策略设计

信息论已经非常成功地获得了各种问题的性能极限，如通信、压缩和假设检验。同样，随机控制理论使用动态规划描述了部分可观察马尔可夫决策过程(pomdp)的最优策略。然而，一般来说，为这些问题找到最优策略在计算上是困难的，因此，在实践中采用启发式解决方案。深度学习可以作为一种工具，用于在此类问题中设计更好的启发式方法。本文研究主动序列假设检验问题。我们的目标是设计一个策略，通过自适应地选择适当的查询，使用尽可能少的样本，可靠地推断出真实的假设。该问题可以建模为一个POMDP模型，文献中存在其值函数的边界。然而，尚未确定最佳策略，并使用了各种启发式方法。本文提出了两种新的启发式算法:一种基于深度强化学习，另一种基于kl散度零和博弈。将这些启发式方法与最先进的解决方案进行了比较，并通过数值实验证明，在某些情况下，所提出的启发式方法比现有方法取得了明显更好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton)

自引率

0.00%

发文量