{"title":"Refining health outcomes of interest using formal concept analysis and semantic query expansion","authors":"Olivier Curé, H. Maurer, N. Shah, P. LePendu","doi":"10.1145/2512089.2512095","DOIUrl":null,"url":null,"abstract":"Clinicians and researchers using Electronic Health Records (EHRs) often search for, extract, and analyze groups of patients by defining a Health Outcome of Interest (HOI), which may include a set of diseases, conditions, signs, or symptoms. In our work on pharmacovigilance using clinical notes, for example, we use a method that operates over many (potentially hundreds) of ontologies at once, expands the input query, and increases the search space over clinical text as well as structured data. This method requires specifying an initial set of seed concepts, based on concept unique identifiers from the UMLS Metathesaurus. In some cases, such as for progressive multifocal leukoencephalopathy, the seed query is easy to specify, but in other cases this task can be more subtle and requires manual-intensive work, such as for chronic obstructive pulmonary disease. The challenge in defining an HOI arises because medical and health terminologies are numerous and complex. We have developed a method consisting of a cooperation between Semantic Query Expansion, to leverage the hierarchical structure of ontologies, and Formal Concept Analysis, to organize, reason, and prune discovered concepts in an efficient manner over a large number of ontologies. Together, they assist the user, through a RESTful API and a web-based graphical user interface, in defining their seed query and in refining the expanded search space that it encompasses. In this context, end-user interactions mainly consist in accepting or rejecting system propositions and can be ceased on the user's will. We use this approach for text-mining clinical notes from EHRs, but they are equally applicable for cohort building tools in general. A preliminary evaluation of this work, on the i2b2 Obesity NLP reference set, emphasizes positive results for sensitivity and specificity measures which are slightly improving existing results on this gold standard. This experimentation also highlights that our semi-automatic approach provides fast processing times (in the order of milliseconds to few seconds) for the generation of several thousands of potential terms. The most promising aspect of this approach is the discovery of potentially positive results from false negative concepts discovered by our method. In future works, we aim to conduct user driven evaluation of the Web interface, analyze the acceptance/rejection of physicians in several practical scenarios and use active learning over past query refinements to improve future queries.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data and Text Mining in Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2512089.2512095","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Clinicians and researchers using Electronic Health Records (EHRs) often search for, extract, and analyze groups of patients by defining a Health Outcome of Interest (HOI), which may include a set of diseases, conditions, signs, or symptoms. In our work on pharmacovigilance using clinical notes, for example, we use a method that operates over many (potentially hundreds) of ontologies at once, expands the input query, and increases the search space over clinical text as well as structured data. This method requires specifying an initial set of seed concepts, based on concept unique identifiers from the UMLS Metathesaurus. In some cases, such as for progressive multifocal leukoencephalopathy, the seed query is easy to specify, but in other cases this task can be more subtle and requires manual-intensive work, such as for chronic obstructive pulmonary disease. The challenge in defining an HOI arises because medical and health terminologies are numerous and complex. We have developed a method consisting of a cooperation between Semantic Query Expansion, to leverage the hierarchical structure of ontologies, and Formal Concept Analysis, to organize, reason, and prune discovered concepts in an efficient manner over a large number of ontologies. Together, they assist the user, through a RESTful API and a web-based graphical user interface, in defining their seed query and in refining the expanded search space that it encompasses. In this context, end-user interactions mainly consist in accepting or rejecting system propositions and can be ceased on the user's will. We use this approach for text-mining clinical notes from EHRs, but they are equally applicable for cohort building tools in general. A preliminary evaluation of this work, on the i2b2 Obesity NLP reference set, emphasizes positive results for sensitivity and specificity measures which are slightly improving existing results on this gold standard. This experimentation also highlights that our semi-automatic approach provides fast processing times (in the order of milliseconds to few seconds) for the generation of several thousands of potential terms. The most promising aspect of this approach is the discovery of potentially positive results from false negative concepts discovered by our method. In future works, we aim to conduct user driven evaluation of the Web interface, analyze the acceptance/rejection of physicians in several practical scenarios and use active learning over past query refinements to improve future queries.