用于自动结构化临床文本注释的候选者-选择器神经架构

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management Pub Date : 2017-11-01 DOI:10.1145/3132847.3132989

Gaurav Singh, Iain J Marshall, James Thomas, John Shawe-Taylor, Byron C Wallace

{"title":"用于自动结构化临床文本注释的候选者-选择器神经架构","authors":"Gaurav Singh, Iain J Marshall, James Thomas, John Shawe-Taylor, Byron C Wallace","doi":"10.1145/3132847.3132989","DOIUrl":null,"url":null,"abstract":"We consider the task of automatically annotating free texts describing clinical trials with concepts from a controlled, structured medical vocabulary. Specifically we aim to build a model to infer distinct sets of (ontological) concepts describing complementary clinically salient aspects of the underlying trials: the populations enrolled, the interventions administered and the outcomes measured, i.e., the PICO elements. This important practical problem poses a few key challenges. One issue is that the output space is vast, because the vocabulary comprises many unique concepts. Compounding this problem, annotated data in this domain is expensive to collect and hence sparse. Furthermore, the outputs (sets of concepts for each PICO element) are correlated: specific populations (e.g., diabetics) will render certain intervention concepts likely (insulin therapy) while effectively precluding others (radiation therapy). Such correlations should be exploited. We propose a novel neural model that addresses these challenges. We introduce a Candidate-Selector architecture in which the model considers setes of candidate concepts for PICO elements, and assesses their plausibility conditioned on the input text to be annotated. This relies on a 'candidate set' generator, which may be learned or relies on heuristics. A conditional discriminative neural model then jointly selects candidate concepts, given the input text. We compare the predictive performance of our approach to strong baselines, and show that it outperforms them. Finally, we perform a qualitative evaluation of the generated annotations by asking domain experts to assess their quality.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"2017 ","pages":"1519-1528"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5752318/pdf/nihms927025.pdf","citationCount":"0","resultStr":"{\"title\":\"A Neural Candidate-Selector Architecture for Automatic Structured Clinical Text Annotation.\",\"authors\":\"Gaurav Singh, Iain J Marshall, James Thomas, John Shawe-Taylor, Byron C Wallace\",\"doi\":\"10.1145/3132847.3132989\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider the task of automatically annotating free texts describing clinical trials with concepts from a controlled, structured medical vocabulary. Specifically we aim to build a model to infer distinct sets of (ontological) concepts describing complementary clinically salient aspects of the underlying trials: the populations enrolled, the interventions administered and the outcomes measured, i.e., the PICO elements. This important practical problem poses a few key challenges. One issue is that the output space is vast, because the vocabulary comprises many unique concepts. Compounding this problem, annotated data in this domain is expensive to collect and hence sparse. Furthermore, the outputs (sets of concepts for each PICO element) are correlated: specific populations (e.g., diabetics) will render certain intervention concepts likely (insulin therapy) while effectively precluding others (radiation therapy). Such correlations should be exploited. We propose a novel neural model that addresses these challenges. We introduce a Candidate-Selector architecture in which the model considers setes of candidate concepts for PICO elements, and assesses their plausibility conditioned on the input text to be annotated. This relies on a 'candidate set' generator, which may be learned or relies on heuristics. A conditional discriminative neural model then jointly selects candidate concepts, given the input text. We compare the predictive performance of our approach to strong baselines, and show that it outperforms them. Finally, we perform a qualitative evaluation of the generated annotations by asking domain experts to assess their quality.\",\"PeriodicalId\":74507,\"journal\":{\"name\":\"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management\",\"volume\":\"2017 \",\"pages\":\"1519-1528\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5752318/pdf/nihms927025.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3132847.3132989\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3132847.3132989","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

我们考虑的任务是用受控结构化医学词汇中的概念自动注释描述临床试验的自由文本。具体来说，我们的目标是建立一个模型，以推断出不同的（本体论）概念集，这些概念集描述了相关试验的互补性临床突出方面：入组人群、实施的干预措施和测量的结果，即 PICO 要素。这个重要的实际问题提出了几个关键挑战。一个问题是，由于词汇包含许多独特的概念，因此输出空间非常大。使这一问题更加复杂的是，该领域的注释数据收集成本高昂，因此数量稀少。此外，输出结果（每个 PICO 要素的概念集）是相互关联的：特定人群（如糖尿病患者）可能会使用某些干预概念（胰岛素疗法），同时有效地排除其他概念（放射疗法）。这种相关性应该加以利用。我们提出了一种新型神经模型来应对这些挑战。我们引入了 "候选-选择器 "架构，在该架构中，模型会考虑 PICO 要素的候选概念集，并根据要注释的输入文本评估其合理性。这依赖于一个 "候选集 "生成器，该生成器可以是学习得来的，也可以依赖于启发式方法。然后，一个条件判别神经模型会根据输入文本共同选择候选概念。我们将我们的方法与强大的基线进行了预测性能比较，结果表明我们的方法优于它们。最后，我们请领域专家对生成的注释质量进行了定性评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

A Neural Candidate-Selector Architecture for Automatic Structured Clinical Text Annotation.

查看原文本刊更多论文

A Neural Candidate-Selector Architecture for Automatic Structured Clinical Text Annotation.

We consider the task of automatically annotating free texts describing clinical trials with concepts from a controlled, structured medical vocabulary. Specifically we aim to build a model to infer distinct sets of (ontological) concepts describing complementary clinically salient aspects of the underlying trials: the populations enrolled, the interventions administered and the outcomes measured, i.e., the PICO elements. This important practical problem poses a few key challenges. One issue is that the output space is vast, because the vocabulary comprises many unique concepts. Compounding this problem, annotated data in this domain is expensive to collect and hence sparse. Furthermore, the outputs (sets of concepts for each PICO element) are correlated: specific populations (e.g., diabetics) will render certain intervention concepts likely (insulin therapy) while effectively precluding others (radiation therapy). Such correlations should be exploited. We propose a novel neural model that addresses these challenges. We introduce a Candidate-Selector architecture in which the model considers setes of candidate concepts for PICO elements, and assesses their plausibility conditioned on the input text to be annotated. This relies on a 'candidate set' generator, which may be learned or relies on heuristics. A conditional discriminative neural model then jointly selects candidate concepts, given the input text. We compare the predictive performance of our approach to strong baselines, and show that it outperforms them. Finally, we perform a qualitative evaluation of the generated annotations by asking domain experts to assess their quality.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management

自引率

0.00%

发文量