药物不良反应的关注和嵌入改进RNN

Proceedings of the 2017 International Conference on Digital Health Pub Date : 2017-07-02 DOI:10.1145/3079452.3079501

Chandra Pandey, Zina M. Ibrahim, Honghan Wu, Ehtesham Iqbal, R. Dobson

{"title":"药物不良反应的关注和嵌入改进RNN","authors":"Chandra Pandey, Zina M. Ibrahim, Honghan Wu, Ehtesham Iqbal, R. Dobson","doi":"10.1145/3079452.3079501","DOIUrl":null,"url":null,"abstract":"Electronic Health Records (EHR) narratives are a rich source of information, embedding high-resolution information of value to secondary research use. However, because the EHRs are mostly in natural language free-text and highly ambiguity-ridden, many natural language processing algorithms have been devised around them to extract meaningful structured information about clinical entities. The performance of the algorithms however, largely varies depending on the training dataset as well as the effectiveness of the use of background knowledge to steer the learning process. In this paper we study the impact of initializing the training of a neural network natural language processing algorithm with pre-defined clinical word embeddings to improve feature extraction and relationship classification between entities. We add our embedding framework to a bi-directional long short-term memory (Bi-LSTM) neural network, and further study the effect of using attention weights in neural networks for sequence labelling tasks to extract knowledge of Adverse Drug Reactions (ADRs). We incorporate unsupervised word embeddings using Word2Vec and GloVe from widely available medical resources such as Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) II corpora, Unified Medical Language System (UMLS) as well as embed pharmaco lexicon from available EHRs. Our algorithm, implemented using two datasets, shows that our architecture outperforms baseline Bi-LSTM or Bi-LSTM networks using linear chain and Skip-Chain conditional random fields (CRF).","PeriodicalId":245682,"journal":{"name":"Proceedings of the 2017 International Conference on Digital Health","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"Improving RNN with Attention and Embedding for Adverse Drug Reactions\",\"authors\":\"Chandra Pandey, Zina M. Ibrahim, Honghan Wu, Ehtesham Iqbal, R. Dobson\",\"doi\":\"10.1145/3079452.3079501\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Electronic Health Records (EHR) narratives are a rich source of information, embedding high-resolution information of value to secondary research use. However, because the EHRs are mostly in natural language free-text and highly ambiguity-ridden, many natural language processing algorithms have been devised around them to extract meaningful structured information about clinical entities. The performance of the algorithms however, largely varies depending on the training dataset as well as the effectiveness of the use of background knowledge to steer the learning process. In this paper we study the impact of initializing the training of a neural network natural language processing algorithm with pre-defined clinical word embeddings to improve feature extraction and relationship classification between entities. We add our embedding framework to a bi-directional long short-term memory (Bi-LSTM) neural network, and further study the effect of using attention weights in neural networks for sequence labelling tasks to extract knowledge of Adverse Drug Reactions (ADRs). We incorporate unsupervised word embeddings using Word2Vec and GloVe from widely available medical resources such as Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) II corpora, Unified Medical Language System (UMLS) as well as embed pharmaco lexicon from available EHRs. Our algorithm, implemented using two datasets, shows that our architecture outperforms baseline Bi-LSTM or Bi-LSTM networks using linear chain and Skip-Chain conditional random fields (CRF).\",\"PeriodicalId\":245682,\"journal\":{\"name\":\"Proceedings of the 2017 International Conference on Digital Health\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2017 International Conference on Digital Health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3079452.3079501\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 International Conference on Digital Health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3079452.3079501","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

摘要

电子健康记录(EHR)叙述是一个丰富的信息来源，嵌入了对二次研究使用有价值的高分辨率信息。然而，由于电子病历大多是自然语言的自由文本，并且极易产生歧义，因此围绕它们设计了许多自然语言处理算法来提取有关临床实体的有意义的结构化信息。然而，算法的性能在很大程度上取决于训练数据集以及使用背景知识来指导学习过程的有效性。在本文中，我们研究了使用预定义临床词嵌入初始化神经网络自然语言处理算法的训练对改善实体之间的特征提取和关系分类的影响。我们将嵌入框架添加到双向长短期记忆(Bi-LSTM)神经网络中，并进一步研究了在神经网络中使用注意力权重进行序列标记任务以提取药物不良反应(adr)知识的效果。我们使用Word2Vec和GloVe进行无监督词嵌入，这些词嵌入来自广泛可用的医疗资源，如重症监护多参数智能监测(MIMIC) II语料库、统一医学语言系统(UMLS)，以及嵌入来自现有电子病历的药物词典。我们的算法使用两个数据集实现，表明我们的架构优于使用线性链和跳链条件随机场(CRF)的基线Bi-LSTM或Bi-LSTM网络。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving RNN with Attention and Embedding for Adverse Drug Reactions

Electronic Health Records (EHR) narratives are a rich source of information, embedding high-resolution information of value to secondary research use. However, because the EHRs are mostly in natural language free-text and highly ambiguity-ridden, many natural language processing algorithms have been devised around them to extract meaningful structured information about clinical entities. The performance of the algorithms however, largely varies depending on the training dataset as well as the effectiveness of the use of background knowledge to steer the learning process. In this paper we study the impact of initializing the training of a neural network natural language processing algorithm with pre-defined clinical word embeddings to improve feature extraction and relationship classification between entities. We add our embedding framework to a bi-directional long short-term memory (Bi-LSTM) neural network, and further study the effect of using attention weights in neural networks for sequence labelling tasks to extract knowledge of Adverse Drug Reactions (ADRs). We incorporate unsupervised word embeddings using Word2Vec and GloVe from widely available medical resources such as Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) II corpora, Unified Medical Language System (UMLS) as well as embed pharmaco lexicon from available EHRs. Our algorithm, implemented using two datasets, shows that our architecture outperforms baseline Bi-LSTM or Bi-LSTM networks using linear chain and Skip-Chain conditional random fields (CRF).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2017 International Conference on Digital Health

自引率

0.00%

发文量