Using deep neural network to recognize mutation entities in biomedical literature

2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) Pub Date : 2018-12-01 DOI:10.1109/BIBM.2018.8621134

Fan Tong, Zheheng Luo, Dongsheng Zhao

{"title":"Using deep neural network to recognize mutation entities in biomedical literature","authors":"Fan Tong, Zheheng Luo, Dongsheng Zhao","doi":"10.1109/BIBM.2018.8621134","DOIUrl":null,"url":null,"abstract":"Automatic recognizing mutation mentions plays a fundamental and critical role in extracting variant-disease relation from biomedical literature. In this paper, we proposed an advanced model for mutation mentions detection by using deep network in combination with decoding algorithm and regular expression. Inspired by the distributed representation of words and characters, we divide each word by letters of difference case, numbers and special characters into tokens for training a token embedding which can capture some nomenclature features of mutations. To build the network, we implemented Bi-directional LSTM (long short-term memory) layers to learn a general form of mutation mentions while capture long-term context information and fully-connected layers to improve the fitting capability, using concatenation of word vectors training from token embeddings as the input. Viterbi algorithm was used to decode the previous output to access initial labeled sequence. On top of that, regular expression patterns were used to label the mutation mentions, which provided extra information to optimize the initial output. While training and testing on NCBI tmVar mutation corpus, our model achieved F-score of 91.59% which performed better than current reported systems.","PeriodicalId":108667,"journal":{"name":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"12 5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2018.8621134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Automatic recognizing mutation mentions plays a fundamental and critical role in extracting variant-disease relation from biomedical literature. In this paper, we proposed an advanced model for mutation mentions detection by using deep network in combination with decoding algorithm and regular expression. Inspired by the distributed representation of words and characters, we divide each word by letters of difference case, numbers and special characters into tokens for training a token embedding which can capture some nomenclature features of mutations. To build the network, we implemented Bi-directional LSTM (long short-term memory) layers to learn a general form of mutation mentions while capture long-term context information and fully-connected layers to improve the fitting capability, using concatenation of word vectors training from token embeddings as the input. Viterbi algorithm was used to decode the previous output to access initial labeled sequence. On top of that, regular expression patterns were used to label the mutation mentions, which provided extra information to optimize the initial output. While training and testing on NCBI tmVar mutation corpus, our model achieved F-score of 91.59% which performed better than current reported systems.

查看原文本刊更多论文

利用深度神经网络识别生物医学文献中的突变实体

突变提及的自动识别是生物医学文献中变异-疾病关系提取的基础和关键。本文提出了一种将深度网络与解码算法和正则表达式相结合的高级突变提及检测模型。受单词和字符的分布式表示的启发，我们将每个单词按不同大小写、数字和特殊字符的字母划分为标记，用于训练标记嵌入，该标记嵌入可以捕获突变的一些命名特征。为了构建网络，我们实现了双向LSTM(长短期记忆)层来学习突变提及的一般形式，同时捕获长期上下文信息和完全连接层来提高拟合能力，使用来自令牌嵌入的词向量训练的连接作为输入。使用Viterbi算法对之前的输出进行解码，得到初始标记序列。最重要的是，正则表达式模式用于标记提到的突变，这为优化初始输出提供了额外的信息。在NCBI tmVar突变语料库上进行训练和测试时，该模型的f值为91.59%，优于现有报道的系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

自引率

0.00%

发文量