药物不良反应的关注和嵌入改进RNN

Chandra Pandey, Zina M. Ibrahim, Honghan Wu, Ehtesham Iqbal, R. Dobson
{"title":"药物不良反应的关注和嵌入改进RNN","authors":"Chandra Pandey, Zina M. Ibrahim, Honghan Wu, Ehtesham Iqbal, R. Dobson","doi":"10.1145/3079452.3079501","DOIUrl":null,"url":null,"abstract":"Electronic Health Records (EHR) narratives are a rich source of information, embedding high-resolution information of value to secondary research use. However, because the EHRs are mostly in natural language free-text and highly ambiguity-ridden, many natural language processing algorithms have been devised around them to extract meaningful structured information about clinical entities. The performance of the algorithms however, largely varies depending on the training dataset as well as the effectiveness of the use of background knowledge to steer the learning process. In this paper we study the impact of initializing the training of a neural network natural language processing algorithm with pre-defined clinical word embeddings to improve feature extraction and relationship classification between entities. We add our embedding framework to a bi-directional long short-term memory (Bi-LSTM) neural network, and further study the effect of using attention weights in neural networks for sequence labelling tasks to extract knowledge of Adverse Drug Reactions (ADRs). We incorporate unsupervised word embeddings using Word2Vec and GloVe from widely available medical resources such as Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) II corpora, Unified Medical Language System (UMLS) as well as embed pharmaco lexicon from available EHRs. Our algorithm, implemented using two datasets, shows that our architecture outperforms baseline Bi-LSTM or Bi-LSTM networks using linear chain and Skip-Chain conditional random fields (CRF).","PeriodicalId":245682,"journal":{"name":"Proceedings of the 2017 International Conference on Digital Health","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"Improving RNN with Attention and Embedding for Adverse Drug Reactions\",\"authors\":\"Chandra Pandey, Zina M. Ibrahim, Honghan Wu, Ehtesham Iqbal, R. Dobson\",\"doi\":\"10.1145/3079452.3079501\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Electronic Health Records (EHR) narratives are a rich source of information, embedding high-resolution information of value to secondary research use. However, because the EHRs are mostly in natural language free-text and highly ambiguity-ridden, many natural language processing algorithms have been devised around them to extract meaningful structured information about clinical entities. The performance of the algorithms however, largely varies depending on the training dataset as well as the effectiveness of the use of background knowledge to steer the learning process. In this paper we study the impact of initializing the training of a neural network natural language processing algorithm with pre-defined clinical word embeddings to improve feature extraction and relationship classification between entities. We add our embedding framework to a bi-directional long short-term memory (Bi-LSTM) neural network, and further study the effect of using attention weights in neural networks for sequence labelling tasks to extract knowledge of Adverse Drug Reactions (ADRs). We incorporate unsupervised word embeddings using Word2Vec and GloVe from widely available medical resources such as Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) II corpora, Unified Medical Language System (UMLS) as well as embed pharmaco lexicon from available EHRs. Our algorithm, implemented using two datasets, shows that our architecture outperforms baseline Bi-LSTM or Bi-LSTM networks using linear chain and Skip-Chain conditional random fields (CRF).\",\"PeriodicalId\":245682,\"journal\":{\"name\":\"Proceedings of the 2017 International Conference on Digital Health\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2017 International Conference on Digital Health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3079452.3079501\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 International Conference on Digital Health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3079452.3079501","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24

摘要

电子健康记录(EHR)叙述是一个丰富的信息来源,嵌入了对二次研究使用有价值的高分辨率信息。然而,由于电子病历大多是自然语言的自由文本,并且极易产生歧义,因此围绕它们设计了许多自然语言处理算法来提取有关临床实体的有意义的结构化信息。然而,算法的性能在很大程度上取决于训练数据集以及使用背景知识来指导学习过程的有效性。在本文中,我们研究了使用预定义临床词嵌入初始化神经网络自然语言处理算法的训练对改善实体之间的特征提取和关系分类的影响。我们将嵌入框架添加到双向长短期记忆(Bi-LSTM)神经网络中,并进一步研究了在神经网络中使用注意力权重进行序列标记任务以提取药物不良反应(adr)知识的效果。我们使用Word2Vec和GloVe进行无监督词嵌入,这些词嵌入来自广泛可用的医疗资源,如重症监护多参数智能监测(MIMIC) II语料库、统一医学语言系统(UMLS),以及嵌入来自现有电子病历的药物词典。我们的算法使用两个数据集实现,表明我们的架构优于使用线性链和跳链条件随机场(CRF)的基线Bi-LSTM或Bi-LSTM网络。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Improving RNN with Attention and Embedding for Adverse Drug Reactions
Electronic Health Records (EHR) narratives are a rich source of information, embedding high-resolution information of value to secondary research use. However, because the EHRs are mostly in natural language free-text and highly ambiguity-ridden, many natural language processing algorithms have been devised around them to extract meaningful structured information about clinical entities. The performance of the algorithms however, largely varies depending on the training dataset as well as the effectiveness of the use of background knowledge to steer the learning process. In this paper we study the impact of initializing the training of a neural network natural language processing algorithm with pre-defined clinical word embeddings to improve feature extraction and relationship classification between entities. We add our embedding framework to a bi-directional long short-term memory (Bi-LSTM) neural network, and further study the effect of using attention weights in neural networks for sequence labelling tasks to extract knowledge of Adverse Drug Reactions (ADRs). We incorporate unsupervised word embeddings using Word2Vec and GloVe from widely available medical resources such as Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) II corpora, Unified Medical Language System (UMLS) as well as embed pharmaco lexicon from available EHRs. Our algorithm, implemented using two datasets, shows that our architecture outperforms baseline Bi-LSTM or Bi-LSTM networks using linear chain and Skip-Chain conditional random fields (CRF).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信