使用IRL生成奖励函数实现癌症个体化筛查。

Artificial intelligence in health : first International Workshop, AIH 2018, Stockholm, Sweden, July 13-14, 2018, Revised selected papers. AIH (Workshop) (1st : 2018 : Stockholm, Sweden) Pub Date : 2019-01-01 Epub Date: 2019-02-21 DOI:10.1007/978-3-030-12738-1_16

Panayiotis Petousis, Simon X Han, William Hsu, Alex A T Bui

{"title":"使用IRL生成奖励函数实现癌症个体化筛查。","authors":"Panayiotis Petousis, Simon X Han, William Hsu, Alex A T Bui","doi":"10.1007/978-3-030-12738-1_16","DOIUrl":null,"url":null,"abstract":"Cancer screening can benefit from individualized decision-making tools that decrease overdiagnosis. The heterogeneity of cancer screening participants advocates the need for more personalized methods. Partially observable Markov decision processes (POMDPs), when defined with an appropriate reward function, can be used to suggest optimal, individualized screening policies. However, determining an appropriate reward function can be challenging. Here, we propose the use of inverse reinforcement learning (IRL) to form rewards functions for lung and breast cancer screening POMDPs. Using experts (physicians) retrospective screening decisions for lung and breast cancer screening, we developed two POMDP models with corresponding reward functions. Specifically, the maximum entropy (MaxEnt) IRL algorithm with an adaptive step size was employed to learn rewards more efficiently; and combined with a multiplicative model to learn state-action pair rewards for a POMDP. The POMDP screening models were evaluated based on their ability to recommend appropriate screening decisions before the diagnosis of cancer. The reward functions learned with the MaxEnt IRL algorithm, when combined with POMDP models in lung and breast cancer screening, demonstrate performance comparable to experts. The Cohen's Kappa score of agreement between the POMDPs and physicians' predictions was high in breast cancer and had a decreasing trend in lung cancer.","PeriodicalId":92754,"journal":{"name":"Artificial intelligence in health : first International Workshop, AIH 2018, Stockholm, Sweden, July 13-14, 2018, Revised selected papers. AIH (Workshop) (1st : 2018 : Stockholm, Sweden)","volume":"11326 ","pages":"213-227"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6667225/pdf/nihms-1040066.pdf","citationCount":"0","resultStr":"{\"title\":\"Generating Reward Functions Using IRL Towards Individualized Cancer Screening.\",\"authors\":\"Panayiotis Petousis, Simon X Han, William Hsu, Alex A T Bui\",\"doi\":\"10.1007/978-3-030-12738-1_16\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cancer screening can benefit from individualized decision-making tools that decrease overdiagnosis. The heterogeneity of cancer screening participants advocates the need for more personalized methods. Partially observable Markov decision processes (POMDPs), when defined with an appropriate reward function, can be used to suggest optimal, individualized screening policies. However, determining an appropriate reward function can be challenging. Here, we propose the use of inverse reinforcement learning (IRL) to form rewards functions for lung and breast cancer screening POMDPs. Using experts (physicians) retrospective screening decisions for lung and breast cancer screening, we developed two POMDP models with corresponding reward functions. Specifically, the maximum entropy (MaxEnt) IRL algorithm with an adaptive step size was employed to learn rewards more efficiently; and combined with a multiplicative model to learn state-action pair rewards for a POMDP. The POMDP screening models were evaluated based on their ability to recommend appropriate screening decisions before the diagnosis of cancer. The reward functions learned with the MaxEnt IRL algorithm, when combined with POMDP models in lung and breast cancer screening, demonstrate performance comparable to experts. The Cohen's Kappa score of agreement between the POMDPs and physicians' predictions was high in breast cancer and had a decreasing trend in lung cancer.\",\"PeriodicalId\":92754,\"journal\":{\"name\":\"Artificial intelligence in health : first International Workshop, AIH 2018, Stockholm, Sweden, July 13-14, 2018, Revised selected papers. AIH (Workshop) (1st : 2018 : Stockholm, Sweden)\",\"volume\":\"11326 \",\"pages\":\"213-227\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6667225/pdf/nihms-1040066.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial intelligence in health : first International Workshop, AIH 2018, Stockholm, Sweden, July 13-14, 2018, Revised selected papers. AIH (Workshop) (1st : 2018 : Stockholm, Sweden)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/978-3-030-12738-1_16\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2019/2/21 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence in health : first International Workshop, AIH 2018, Stockholm, Sweden, July 13-14, 2018, Revised selected papers. AIH (Workshop) (1st : 2018 : Stockholm, Sweden)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/978-3-030-12738-1_16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2019/2/21 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

癌症筛查可以受益于减少过度诊断的个性化决策工具。癌症筛查参与者的异质性主张需要更个性化的方法。当用适当的奖励函数定义部分可观察的马尔可夫决策过程（POMDP）时，可以用来建议最佳的、个性化的筛选策略。然而，确定适当的奖励函数可能具有挑战性。在此，我们建议使用反向强化学习（IRL）来形成用于筛查肺癌和乳腺癌POMDP的奖励函数。利用专家（医师）对癌症和乳腺癌筛查的回顾性筛查决策，我们开发了两个具有相应奖励功能的POMDP模型。具体而言，采用具有自适应步长的最大熵（MaxEnt）IRL算法来更有效地学习奖励；并与乘法模型相结合以学习POMDP的状态-动作对奖励。POMDP筛查模型根据其在诊断癌症前推荐适当筛查决定的能力进行评估。用MaxEnt-IRL算法学习的奖励函数，当与POMDP模型结合在肺和乳腺癌症筛查中时，表现出与专家相当的性能。POMDP与医生预测之间的Cohen’s Kappa一致性得分在乳腺癌症中较高，在癌症中呈下降趋势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Generating Reward Functions Using IRL Towards Individualized Cancer Screening.

查看原文本刊更多论文

Generating Reward Functions Using IRL Towards Individualized Cancer Screening.

Cancer screening can benefit from individualized decision-making tools that decrease overdiagnosis. The heterogeneity of cancer screening participants advocates the need for more personalized methods. Partially observable Markov decision processes (POMDPs), when defined with an appropriate reward function, can be used to suggest optimal, individualized screening policies. However, determining an appropriate reward function can be challenging. Here, we propose the use of inverse reinforcement learning (IRL) to form rewards functions for lung and breast cancer screening POMDPs. Using experts (physicians) retrospective screening decisions for lung and breast cancer screening, we developed two POMDP models with corresponding reward functions. Specifically, the maximum entropy (MaxEnt) IRL algorithm with an adaptive step size was employed to learn rewards more efficiently; and combined with a multiplicative model to learn state-action pair rewards for a POMDP. The POMDP screening models were evaluated based on their ability to recommend appropriate screening decisions before the diagnosis of cancer. The reward functions learned with the MaxEnt IRL algorithm, when combined with POMDP models in lung and breast cancer screening, demonstrate performance comparable to experts. The Cohen's Kappa score of agreement between the POMDPs and physicians' predictions was high in breast cancer and had a decreasing trend in lung cancer.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Artificial intelligence in health : first International Workshop, AIH 2018, Stockholm, Sweden, July 13-14, 2018, Revised selected papers. AIH (Workshop) (1st : 2018 : Stockholm, Sweden)

自引率

0.00%

发文量