基于子目标表示学习的分层模仿学习动态治疗推荐

Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining Pub Date : 2022-02-11 DOI:10.1145/3488560.3498535

Lu Wang, Ruiming Tang, Xiaofeng He, Xiuqiang He

{"title":"基于子目标表示学习的分层模仿学习动态治疗推荐","authors":"Lu Wang, Ruiming Tang, Xiaofeng He, Xiuqiang He","doi":"10.1145/3488560.3498535","DOIUrl":null,"url":null,"abstract":"Dynamic Treatment Recommendation (DTR) is a sequence of tailored treatment decision rules which can be grouped as individual sub-tasks. As the reward signals in DTR are hard to design, Imitation Learning (IL) has achieved great success as it is effective in mimicking doctors' behaviors from their demonstrations without explicit reward signals. As a patient may have several different symptoms, the behaviors in doctors' demonstrations can often be grouped to handle individual symptoms. However, a single flat policy learned by IL is difficult to mimic doctors' demonstrations with such hierarchical structure, where low-level behaviors are switching from one symptom to another controlled by high-level decisions. Due to this observation, we consider Hierarchical Imitation Learning methods as good solutions for DTR. In this paper, we propose a novel Subgoal conditioned HIL framework (short for SHIL), where a high-level policy sequentially sets a subgoal for each sub-task without prior knowledge, and the low-level policy for sub-tasks is learned to reach the subgoal. To get rid of prior knowledge, a self-supervised learning method is proposed to learn an effective representation for each subgoal. More specifically, we carefully designed to encourage diverse representations among different subgoals. To demonstrate that SHIL is able to learn meaningful high-level policy and low-level policy that accurately reproduces complex doctors' demonstrations, we conduct experiments on a real-world medical data from health care domain, MIMIC-III. Compared with state-of-the-art baselines, SHIL improves the likelihood of patient survival by a significant margin and provides explainable recommendation with hierarchical structure.","PeriodicalId":348686,"journal":{"name":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","volume":"109 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Hierarchical Imitation Learning via Subgoal Representation Learning for Dynamic Treatment Recommendation\",\"authors\":\"Lu Wang, Ruiming Tang, Xiaofeng He, Xiuqiang He\",\"doi\":\"10.1145/3488560.3498535\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dynamic Treatment Recommendation (DTR) is a sequence of tailored treatment decision rules which can be grouped as individual sub-tasks. As the reward signals in DTR are hard to design, Imitation Learning (IL) has achieved great success as it is effective in mimicking doctors' behaviors from their demonstrations without explicit reward signals. As a patient may have several different symptoms, the behaviors in doctors' demonstrations can often be grouped to handle individual symptoms. However, a single flat policy learned by IL is difficult to mimic doctors' demonstrations with such hierarchical structure, where low-level behaviors are switching from one symptom to another controlled by high-level decisions. Due to this observation, we consider Hierarchical Imitation Learning methods as good solutions for DTR. In this paper, we propose a novel Subgoal conditioned HIL framework (short for SHIL), where a high-level policy sequentially sets a subgoal for each sub-task without prior knowledge, and the low-level policy for sub-tasks is learned to reach the subgoal. To get rid of prior knowledge, a self-supervised learning method is proposed to learn an effective representation for each subgoal. More specifically, we carefully designed to encourage diverse representations among different subgoals. To demonstrate that SHIL is able to learn meaningful high-level policy and low-level policy that accurately reproduces complex doctors' demonstrations, we conduct experiments on a real-world medical data from health care domain, MIMIC-III. Compared with state-of-the-art baselines, SHIL improves the likelihood of patient survival by a significant margin and provides explainable recommendation with hierarchical structure.\",\"PeriodicalId\":348686,\"journal\":{\"name\":\"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining\",\"volume\":\"109 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3488560.3498535\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3488560.3498535","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

动态治疗建议(DTR)是一系列量身定制的治疗决策规则，可以分组为单独的子任务。由于DTR中的奖励信号难以设计，模仿学习(IL)可以在没有明确奖励信号的情况下，从医生的演示中模仿医生的行为，取得了很大的成功。由于患者可能有几种不同的症状，医生演示中的行为通常可以分组处理单个症状。然而，IL学习的单一扁平策略很难模仿具有这种分层结构的医生演示，在这种分层结构中，低级行为由高级决策控制从一种症状切换到另一种症状。由于这一观察，我们认为分层模仿学习方法是DTR的良好解决方案。在本文中，我们提出了一种新的子目标条件HIL框架(简称SHIL)，其中高级策略在没有先验知识的情况下依次为每个子任务设置子目标，并学习子任务的低级策略以达到子目标。为了摆脱先验知识的影响，提出了一种自监督学习方法来学习每个子目标的有效表示。更具体地说，我们精心设计以鼓励不同子目标之间的不同表示。为了证明SHIL能够学习有意义的高级政策和低级政策，并准确地再现复杂的医生演示，我们对来自医疗保健领域的真实医疗数据MIMIC-III进行了实验。与最先进的基线相比，SHIL显著提高了患者生存的可能性，并提供了分层结构的可解释推荐。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Hierarchical Imitation Learning via Subgoal Representation Learning for Dynamic Treatment Recommendation

Dynamic Treatment Recommendation (DTR) is a sequence of tailored treatment decision rules which can be grouped as individual sub-tasks. As the reward signals in DTR are hard to design, Imitation Learning (IL) has achieved great success as it is effective in mimicking doctors' behaviors from their demonstrations without explicit reward signals. As a patient may have several different symptoms, the behaviors in doctors' demonstrations can often be grouped to handle individual symptoms. However, a single flat policy learned by IL is difficult to mimic doctors' demonstrations with such hierarchical structure, where low-level behaviors are switching from one symptom to another controlled by high-level decisions. Due to this observation, we consider Hierarchical Imitation Learning methods as good solutions for DTR. In this paper, we propose a novel Subgoal conditioned HIL framework (short for SHIL), where a high-level policy sequentially sets a subgoal for each sub-task without prior knowledge, and the low-level policy for sub-tasks is learned to reach the subgoal. To get rid of prior knowledge, a self-supervised learning method is proposed to learn an effective representation for each subgoal. More specifically, we carefully designed to encourage diverse representations among different subgoals. To demonstrate that SHIL is able to learn meaningful high-level policy and low-level policy that accurately reproduces complex doctors' demonstrations, we conduct experiments on a real-world medical data from health care domain, MIMIC-III. Compared with state-of-the-art baselines, SHIL improves the likelihood of patient survival by a significant margin and provides explainable recommendation with hierarchical structure.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

自引率

0.00%

发文量