Tsendsuren Munkhdalai, Meijing Li, Erdenetuya Namsrai, Oyun-Erdene Namsrai, K. Ryu
{"title":"BFSM: Finite state machine learned as name boundary definer for bio named entity recognition","authors":"Tsendsuren Munkhdalai, Meijing Li, Erdenetuya Namsrai, Oyun-Erdene Namsrai, K. Ryu","doi":"10.1109/ICAWST.2011.6163168","DOIUrl":null,"url":null,"abstract":"One essential task in automated information extraction for biomedical literature is bio named entity recognition process, which basically defines the boundaries between typical words and technical terms of biomedical domain in particular text data and, classifies them based on the domain knowledge. Due to nature of bio named entity, purely defining boundary of the named entities in text data is still challenging. This paper proposes using the part-of-speech tags of tokens as target observation of name boundary definer tool. We proposed an approach for modeling finite state machine as the boundary definer. Aided by machine learning methods including frequent pattern mining method and Bayesian network, the finite state machine learns on part-of-speech tag of tokens in bio-text data. The finite state machine based on Bayesian network is named BFSM. In addition, we report the influence of part-of-speech tagger tool for learning of BFSM. Experimental results show that the named entity recognition system using the BFSM gives us high accuracy as F-score 85.8.","PeriodicalId":126169,"journal":{"name":"2011 3rd International Conference on Awareness Science and Technology (iCAST)","volume":"304 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 3rd International Conference on Awareness Science and Technology (iCAST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAWST.2011.6163168","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
One essential task in automated information extraction for biomedical literature is bio named entity recognition process, which basically defines the boundaries between typical words and technical terms of biomedical domain in particular text data and, classifies them based on the domain knowledge. Due to nature of bio named entity, purely defining boundary of the named entities in text data is still challenging. This paper proposes using the part-of-speech tags of tokens as target observation of name boundary definer tool. We proposed an approach for modeling finite state machine as the boundary definer. Aided by machine learning methods including frequent pattern mining method and Bayesian network, the finite state machine learns on part-of-speech tag of tokens in bio-text data. The finite state machine based on Bayesian network is named BFSM. In addition, we report the influence of part-of-speech tagger tool for learning of BFSM. Experimental results show that the named entity recognition system using the BFSM gives us high accuracy as F-score 85.8.