Mining Medical Causality for Diagnosis Assistance

Sendong Zhao
{"title":"Mining Medical Causality for Diagnosis Assistance","authors":"Sendong Zhao","doi":"10.1145/3018661.3022752","DOIUrl":null,"url":null,"abstract":"In the medical context, causal knowledge usually refers to causal relations between diseases and symptoms, living habits and diseases, symptoms which get better and therapy, drugs and side-effects, etc [3]. All these causal relations are usually in medical literature, forum and clinical cases and compose the core part of medical diagnosis. Therefore, mining these causal knowledge to predict disease and recommend therapy is of great value for assisting patients and professionals. The task of mining these causal knowledge for diagnosis assistance can be decomposed into four constitutes: (1) mining medical causality from text; (2) medical treatment effectiveness measurement; (3) disease prediction and (4) explicable medical treatment recommendation. However, these tasks have never been systemically studied before. For my PhD thesis, I plan to formally define the problem of mining medical domain causality for diagnosis assistance and propose methods to solve this problem. 1. Ming these textual causalities can be very useful for discovering new knowledge and making decisions. Many studies have been done for causal extraction from the text [1, 4, 5]. However, all these studies are based on pattern or causal triggers, which greatly limit their power to extract causality and rarely consider the frequency of co-occurrence and contextual semantic features. Besides, none of them take the transitivity rules of causality leading to reject those causalities which can be easily get by simple inference. Therefore, we formally define the task of mining causality via frequency of event co-occurrence, semantic distance between event pairs and transitivity rules of causality, and present a factor graph to combine these three resources for causality mining. 2. Treatment effectiveness analysis is usually taken as a subset of causal analysis on observational data. For such real observational data, PSM and RCM are two dominant methods. On one hand, it is usually difficult for PSM to find the matched cases due to the sparsity of symptom. On the other hand, we should check every possible (symptom, treatment) pair by exploiting RCM, leading to make the characteristic of exploding up, especially when we want to check the causal relation between a combination of symptoms and a combination of drugs. Besides, the larger number of symptom or treatment in the combination the less number of patient case retrieved, which lead to the lack of statistical significance. Specifically, patients tend to take tens of herbs as the treatment each time in Traditional Chinese Medicine (TCM). Therefore, how to evaluate the effectiveness of herbs separately and jointly is really a big challenge. This is also a very fundamental research topic supporting many downstream applications. 3. Both hospitals and on-line forums have accumulated sheer amount of records, such as clinical text data and online diagnosis Q&A pairs. The availability of such data in large volume enables automatic disease prediction. There are some papers on disease prediction with electronic health record (EHR) [2], but the research on disease prediction with raw symptoms is still necessary and challenging. Therefore, we propose a general new idea of using the rich contextual information of diseases and symptoms to bridge the gap of disease candidates and symptoms, and detach it from the specific way of implementing the idea using network embedding. 4. Recommendation in medical domain is usually a decision-making issue, which requires the ability of explaining \"why\". The ability of explaining \"why\" are basically from two paths. Consider the recommendation suggest you eat more vegetables. You probably do not believe it if there is nothing attached. But if the recommendation gives the literally reasons why eating more vegetables is good you might like to take this suggestion. Consider another scenario, if the recommendation gives you the data of the contrast which show that people who eat more vegetables are healthier than those eat less, it is certain that you also want to take this recommendation. Based on these two intuitions, we present a recommendation model based on proofs which are either literally reasons or difference from contrast. This work was supported by the 973 program (No. 2014CB340503) and the NSFC (No. 61133012 and No. 61472107).","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3018661.3022752","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

In the medical context, causal knowledge usually refers to causal relations between diseases and symptoms, living habits and diseases, symptoms which get better and therapy, drugs and side-effects, etc [3]. All these causal relations are usually in medical literature, forum and clinical cases and compose the core part of medical diagnosis. Therefore, mining these causal knowledge to predict disease and recommend therapy is of great value for assisting patients and professionals. The task of mining these causal knowledge for diagnosis assistance can be decomposed into four constitutes: (1) mining medical causality from text; (2) medical treatment effectiveness measurement; (3) disease prediction and (4) explicable medical treatment recommendation. However, these tasks have never been systemically studied before. For my PhD thesis, I plan to formally define the problem of mining medical domain causality for diagnosis assistance and propose methods to solve this problem. 1. Ming these textual causalities can be very useful for discovering new knowledge and making decisions. Many studies have been done for causal extraction from the text [1, 4, 5]. However, all these studies are based on pattern or causal triggers, which greatly limit their power to extract causality and rarely consider the frequency of co-occurrence and contextual semantic features. Besides, none of them take the transitivity rules of causality leading to reject those causalities which can be easily get by simple inference. Therefore, we formally define the task of mining causality via frequency of event co-occurrence, semantic distance between event pairs and transitivity rules of causality, and present a factor graph to combine these three resources for causality mining. 2. Treatment effectiveness analysis is usually taken as a subset of causal analysis on observational data. For such real observational data, PSM and RCM are two dominant methods. On one hand, it is usually difficult for PSM to find the matched cases due to the sparsity of symptom. On the other hand, we should check every possible (symptom, treatment) pair by exploiting RCM, leading to make the characteristic of exploding up, especially when we want to check the causal relation between a combination of symptoms and a combination of drugs. Besides, the larger number of symptom or treatment in the combination the less number of patient case retrieved, which lead to the lack of statistical significance. Specifically, patients tend to take tens of herbs as the treatment each time in Traditional Chinese Medicine (TCM). Therefore, how to evaluate the effectiveness of herbs separately and jointly is really a big challenge. This is also a very fundamental research topic supporting many downstream applications. 3. Both hospitals and on-line forums have accumulated sheer amount of records, such as clinical text data and online diagnosis Q&A pairs. The availability of such data in large volume enables automatic disease prediction. There are some papers on disease prediction with electronic health record (EHR) [2], but the research on disease prediction with raw symptoms is still necessary and challenging. Therefore, we propose a general new idea of using the rich contextual information of diseases and symptoms to bridge the gap of disease candidates and symptoms, and detach it from the specific way of implementing the idea using network embedding. 4. Recommendation in medical domain is usually a decision-making issue, which requires the ability of explaining "why". The ability of explaining "why" are basically from two paths. Consider the recommendation suggest you eat more vegetables. You probably do not believe it if there is nothing attached. But if the recommendation gives the literally reasons why eating more vegetables is good you might like to take this suggestion. Consider another scenario, if the recommendation gives you the data of the contrast which show that people who eat more vegetables are healthier than those eat less, it is certain that you also want to take this recommendation. Based on these two intuitions, we present a recommendation model based on proofs which are either literally reasons or difference from contrast. This work was supported by the 973 program (No. 2014CB340503) and the NSFC (No. 61133012 and No. 61472107).
为诊断协助挖掘医学因果关系
在医学语境中,因果知识通常指疾病与症状、生活习惯与疾病、症状好转与治疗、药物与副作用等之间的因果关系[3]。这些因果关系通常出现在医学文献、论坛和临床病例中,是医学诊断的核心部分。因此,挖掘这些因果知识来预测疾病和推荐治疗对于帮助患者和专业人员具有很大的价值。为辅助诊断而挖掘这些因果知识的任务可分为四个部分:(1)从文本中挖掘医学因果关系;(2)医疗效果测量;(3)疾病预测;(4)合理的医疗建议。然而,这些任务以前从未被系统地研究过。在我的博士论文中,我计划正式定义医学领域因果关系挖掘的问题,并提出解决这个问题的方法。1. 这些文本因果关系对于发现新知识和做出决定非常有用。从文本中提取因果关系已经做了许多研究[1,4,5]。然而,这些研究都是基于模式或因果触发,这极大地限制了它们提取因果关系的能力,很少考虑共现频率和语境语义特征。此外,它们都没有采用因果关系的及物性规则,从而拒绝那些通过简单推理就能轻易得到的因果关系。因此,我们通过事件共现频率、事件对语义距离和因果关系的传递性规则形式化地定义了因果关系挖掘的任务,并给出了将这三种资源结合起来进行因果关系挖掘的因子图。2. 治疗效果分析通常被视为对观察数据进行因果分析的一个子集。对于此类实际观测数据,PSM和RCM是两种主要方法。一方面,由于症状的稀疏性,PSM通常难以找到匹配的病例。另一方面,我们应该利用RCM检查每一个可能的(症状,治疗)对,导致爆炸的特征,特别是当我们想检查症状组合与药物组合之间的因果关系时。此外,组合中症状或治疗的数量越多,检索到的病例数量越少,缺乏统计学意义。具体来说,在中医中,患者往往每次服用数十种草药作为治疗方法。因此,如何单独评价和联合评价中药的疗效确实是一个很大的挑战。这也是一个支持许多下游应用的非常基础的研究课题。3.医院和在线论坛都积累了大量的记录,如临床文本数据和在线诊断问答对。大量数据的可用性使疾病自动预测成为可能。利用电子健康档案(electronic health record, EHR)进行疾病预测已有一些论文[2],但对原始症状疾病预测的研究仍然是必要的,也是具有挑战性的。因此,我们提出了一种利用疾病和症状丰富的上下文信息来弥合候选疾病和症状差距的总体新思路,并将其与使用网络嵌入实现该思路的具体方式分离开来。4. 医学领域的推荐通常是一个决策问题,这需要解释“为什么”的能力。解释“为什么”的能力基本上来自两种途径。考虑一下建议你多吃蔬菜。如果没有任何附加条件,你可能不会相信。但如果这个建议给出了多吃蔬菜有益健康的确切理由,你可能会接受这个建议。考虑另一种情况,如果这个建议给你的对比数据表明,吃蔬菜多的人比吃蔬菜少的人更健康,那么你肯定也想接受这个建议。基于这两种直觉,我们提出了一种基于证据的推荐模型,这些证据要么是字面上的原因,要么是对比的差异。基金资助:973计划(No. 2014CB340503)和国家自然科学基金(No. 61133012和61472107)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信