{"title":"Using deep neural network to recognize mutation entities in biomedical literature","authors":"Fan Tong, Zheheng Luo, Dongsheng Zhao","doi":"10.1109/BIBM.2018.8621134","DOIUrl":null,"url":null,"abstract":"Automatic recognizing mutation mentions plays a fundamental and critical role in extracting variant-disease relation from biomedical literature. In this paper, we proposed an advanced model for mutation mentions detection by using deep network in combination with decoding algorithm and regular expression. Inspired by the distributed representation of words and characters, we divide each word by letters of difference case, numbers and special characters into tokens for training a token embedding which can capture some nomenclature features of mutations. To build the network, we implemented Bi-directional LSTM (long short-term memory) layers to learn a general form of mutation mentions while capture long-term context information and fully-connected layers to improve the fitting capability, using concatenation of word vectors training from token embeddings as the input. Viterbi algorithm was used to decode the previous output to access initial labeled sequence. On top of that, regular expression patterns were used to label the mutation mentions, which provided extra information to optimize the initial output. While training and testing on NCBI tmVar mutation corpus, our model achieved F-score of 91.59% which performed better than current reported systems.","PeriodicalId":108667,"journal":{"name":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"12 5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2018.8621134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Automatic recognizing mutation mentions plays a fundamental and critical role in extracting variant-disease relation from biomedical literature. In this paper, we proposed an advanced model for mutation mentions detection by using deep network in combination with decoding algorithm and regular expression. Inspired by the distributed representation of words and characters, we divide each word by letters of difference case, numbers and special characters into tokens for training a token embedding which can capture some nomenclature features of mutations. To build the network, we implemented Bi-directional LSTM (long short-term memory) layers to learn a general form of mutation mentions while capture long-term context information and fully-connected layers to improve the fitting capability, using concatenation of word vectors training from token embeddings as the input. Viterbi algorithm was used to decode the previous output to access initial labeled sequence. On top of that, regular expression patterns were used to label the mutation mentions, which provided extra information to optimize the initial output. While training and testing on NCBI tmVar mutation corpus, our model achieved F-score of 91.59% which performed better than current reported systems.