A Neural Candidate-Selector Architecture for Automatic Structured Clinical Text Annotation.

Gaurav Singh, Iain J Marshall, James Thomas, John Shawe-Taylor, Byron C Wallace
{"title":"A Neural Candidate-Selector Architecture for Automatic Structured Clinical Text Annotation.","authors":"Gaurav Singh, Iain J Marshall, James Thomas, John Shawe-Taylor, Byron C Wallace","doi":"10.1145/3132847.3132989","DOIUrl":null,"url":null,"abstract":"<p><p>We consider the task of automatically annotating free texts describing clinical trials with concepts from a controlled, structured medical vocabulary. Specifically we aim to build a model to infer distinct sets of (ontological) concepts describing complementary clinically salient aspects of the underlying trials: the populations enrolled, the interventions administered and the outcomes measured, i.e., the <i>PICO</i> elements. This important practical problem poses a few key challenges. One issue is that the output space is vast, because the vocabulary comprises many unique concepts. Compounding this problem, annotated data in this domain is expensive to collect and hence sparse. Furthermore, the outputs (sets of concepts for each PICO element) are correlated: specific populations (e.g., diabetics) will render certain intervention concepts likely (insulin therapy) while effectively precluding others (radiation therapy). Such correlations should be exploited. We propose a novel neural model that addresses these challenges. We introduce a Candidate-Selector architecture in which the model considers setes of <i>candidate concepts</i> for PICO elements, and assesses their plausibility conditioned on the input text to be annotated. This relies on a 'candidate set' generator, which may be learned or relies on heuristics. A conditional discriminative neural model then jointly selects candidate concepts, given the input text. We compare the predictive performance of our approach to strong baselines, and show that it outperforms them. Finally, we perform a qualitative evaluation of the generated annotations by asking domain experts to assess their quality.</p>","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2017 ","pages":"1519-1528"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5752318/pdf/nihms927025.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3132847.3132989","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We consider the task of automatically annotating free texts describing clinical trials with concepts from a controlled, structured medical vocabulary. Specifically we aim to build a model to infer distinct sets of (ontological) concepts describing complementary clinically salient aspects of the underlying trials: the populations enrolled, the interventions administered and the outcomes measured, i.e., the PICO elements. This important practical problem poses a few key challenges. One issue is that the output space is vast, because the vocabulary comprises many unique concepts. Compounding this problem, annotated data in this domain is expensive to collect and hence sparse. Furthermore, the outputs (sets of concepts for each PICO element) are correlated: specific populations (e.g., diabetics) will render certain intervention concepts likely (insulin therapy) while effectively precluding others (radiation therapy). Such correlations should be exploited. We propose a novel neural model that addresses these challenges. We introduce a Candidate-Selector architecture in which the model considers setes of candidate concepts for PICO elements, and assesses their plausibility conditioned on the input text to be annotated. This relies on a 'candidate set' generator, which may be learned or relies on heuristics. A conditional discriminative neural model then jointly selects candidate concepts, given the input text. We compare the predictive performance of our approach to strong baselines, and show that it outperforms them. Finally, we perform a qualitative evaluation of the generated annotations by asking domain experts to assess their quality.

Abstract Image

Abstract Image

Abstract Image

用于自动结构化临床文本注释的候选者-选择器神经架构
我们考虑的任务是用受控结构化医学词汇中的概念自动注释描述临床试验的自由文本。具体来说,我们的目标是建立一个模型,以推断出不同的(本体论)概念集,这些概念集描述了相关试验的互补性临床突出方面:入组人群、实施的干预措施和测量的结果,即 PICO 要素。这个重要的实际问题提出了几个关键挑战。一个问题是,由于词汇包含许多独特的概念,因此输出空间非常大。使这一问题更加复杂的是,该领域的注释数据收集成本高昂,因此数量稀少。此外,输出结果(每个 PICO 要素的概念集)是相互关联的:特定人群(如糖尿病患者)可能会使用某些干预概念(胰岛素疗法),同时有效地排除其他概念(放射疗法)。这种相关性应该加以利用。我们提出了一种新型神经模型来应对这些挑战。我们引入了 "候选-选择器 "架构,在该架构中,模型会考虑 PICO 要素的候选概念集,并根据要注释的输入文本评估其合理性。这依赖于一个 "候选集 "生成器,该生成器可以是学习得来的,也可以依赖于启发式方法。然后,一个条件判别神经模型会根据输入文本共同选择候选概念。我们将我们的方法与强大的基线进行了预测性能比较,结果表明我们的方法优于它们。最后,我们请领域专家对生成的注释质量进行了定性评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信