Refining health outcomes of interest using formal concept analysis and semantic query expansion

Data and Text Mining in Bioinformatics Pub Date : 2013-11-01 DOI:10.1145/2512089.2512095

Olivier Curé, H. Maurer, N. Shah, P. LePendu

{"title":"Refining health outcomes of interest using formal concept analysis and semantic query expansion","authors":"Olivier Curé, H. Maurer, N. Shah, P. LePendu","doi":"10.1145/2512089.2512095","DOIUrl":null,"url":null,"abstract":"Clinicians and researchers using Electronic Health Records (EHRs) often search for, extract, and analyze groups of patients by defining a Health Outcome of Interest (HOI), which may include a set of diseases, conditions, signs, or symptoms. In our work on pharmacovigilance using clinical notes, for example, we use a method that operates over many (potentially hundreds) of ontologies at once, expands the input query, and increases the search space over clinical text as well as structured data. This method requires specifying an initial set of seed concepts, based on concept unique identifiers from the UMLS Metathesaurus. In some cases, such as for progressive multifocal leukoencephalopathy, the seed query is easy to specify, but in other cases this task can be more subtle and requires manual-intensive work, such as for chronic obstructive pulmonary disease. The challenge in defining an HOI arises because medical and health terminologies are numerous and complex. We have developed a method consisting of a cooperation between Semantic Query Expansion, to leverage the hierarchical structure of ontologies, and Formal Concept Analysis, to organize, reason, and prune discovered concepts in an efficient manner over a large number of ontologies. Together, they assist the user, through a RESTful API and a web-based graphical user interface, in defining their seed query and in refining the expanded search space that it encompasses. In this context, end-user interactions mainly consist in accepting or rejecting system propositions and can be ceased on the user's will. We use this approach for text-mining clinical notes from EHRs, but they are equally applicable for cohort building tools in general. A preliminary evaluation of this work, on the i2b2 Obesity NLP reference set, emphasizes positive results for sensitivity and specificity measures which are slightly improving existing results on this gold standard. This experimentation also highlights that our semi-automatic approach provides fast processing times (in the order of milliseconds to few seconds) for the generation of several thousands of potential terms. The most promising aspect of this approach is the discovery of potentially positive results from false negative concepts discovered by our method. In future works, we aim to conduct user driven evaluation of the Web interface, analyze the acceptance/rejection of physicians in several practical scenarios and use active learning over past query refinements to improve future queries.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"912 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data and Text Mining in Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2512089.2512095","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Clinicians and researchers using Electronic Health Records (EHRs) often search for, extract, and analyze groups of patients by defining a Health Outcome of Interest (HOI), which may include a set of diseases, conditions, signs, or symptoms. In our work on pharmacovigilance using clinical notes, for example, we use a method that operates over many (potentially hundreds) of ontologies at once, expands the input query, and increases the search space over clinical text as well as structured data. This method requires specifying an initial set of seed concepts, based on concept unique identifiers from the UMLS Metathesaurus. In some cases, such as for progressive multifocal leukoencephalopathy, the seed query is easy to specify, but in other cases this task can be more subtle and requires manual-intensive work, such as for chronic obstructive pulmonary disease. The challenge in defining an HOI arises because medical and health terminologies are numerous and complex. We have developed a method consisting of a cooperation between Semantic Query Expansion, to leverage the hierarchical structure of ontologies, and Formal Concept Analysis, to organize, reason, and prune discovered concepts in an efficient manner over a large number of ontologies. Together, they assist the user, through a RESTful API and a web-based graphical user interface, in defining their seed query and in refining the expanded search space that it encompasses. In this context, end-user interactions mainly consist in accepting or rejecting system propositions and can be ceased on the user's will. We use this approach for text-mining clinical notes from EHRs, but they are equally applicable for cohort building tools in general. A preliminary evaluation of this work, on the i2b2 Obesity NLP reference set, emphasizes positive results for sensitivity and specificity measures which are slightly improving existing results on this gold standard. This experimentation also highlights that our semi-automatic approach provides fast processing times (in the order of milliseconds to few seconds) for the generation of several thousands of potential terms. The most promising aspect of this approach is the discovery of potentially positive results from false negative concepts discovered by our method. In future works, we aim to conduct user driven evaluation of the Web interface, analyze the acceptance/rejection of physicians in several practical scenarios and use active learning over past query refinements to improve future queries.

查看原文本刊更多论文

使用形式概念分析和语义查询扩展来细化感兴趣的健康结果

使用电子健康记录(EHRs)的临床医生和研究人员经常通过定义感兴趣的健康结果(HOI)来搜索、提取和分析患者组，其中可能包括一组疾病、状况、体征或症状。例如，在我们使用临床记录进行药物警戒的工作中，我们使用了一种方法，该方法可以同时操作许多(可能是数百个)本体，扩展输入查询，并增加临床文本和结构化数据的搜索空间。此方法需要根据来自UMLS元词典的概念唯一标识符指定一组初始的种子概念。在某些情况下，如进行性多灶性脑白质病，种子查询很容易指定，但在其他情况下，这项任务可能更微妙，需要人工密集的工作，如慢性阻塞性肺病。由于医学和卫生术语众多且复杂，定义HOI的挑战就出现了。我们已经开发了一种由语义查询扩展(利用本体的层次结构)和形式概念分析(在大量本体上以有效的方式组织、推理和修剪发现的概念)之间的合作组成的方法。它们通过RESTful API和基于web的图形用户界面共同帮助用户定义种子查询并细化其包含的扩展搜索空间。在这种情况下，最终用户交互主要是接受或拒绝系统命题，并可以根据用户的意愿停止。我们将这种方法用于从电子病历中挖掘临床记录的文本，但它们同样适用于一般的队列构建工具。在i2b2肥胖NLP参考集上对这项工作的初步评估强调了敏感性和特异性措施的积极结果，这略微改善了该金标准的现有结果。这个实验还突出表明，我们的半自动方法为生成数千个潜在项提供了快速的处理时间(从几毫秒到几秒钟不等)。这种方法最有希望的方面是从我们的方法发现的假阴性概念中发现潜在的阳性结果。在未来的工作中，我们的目标是对Web界面进行用户驱动的评估，分析医生在几个实际场景中的接受/拒绝，并在过去的查询改进中使用主动学习来改进未来的查询。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Data and Text Mining in Bioinformatics

自引率

0.00%

发文量