The Method of Medical Named Entity Recognition Based on Semantic Model and Improved SVM-KNN Algorithm

Xia Han, Ruonan Rao
{"title":"The Method of Medical Named Entity Recognition Based on Semantic Model and Improved SVM-KNN Algorithm","authors":"Xia Han, Ruonan Rao","doi":"10.1109/SKG.2011.24","DOIUrl":null,"url":null,"abstract":"In the medical field, a lot of unstructured information which is expressed by natural language exists in medical literature, technical documentation and medical records. IE (Information Extraction) as one of the most important research directions in natural language process aims to help humans extract concerned information automatically. NER (Named Entity Recognition) is one of the subsystems of IE and has direct influence on the quality of IE. Nowadays NER of medical field has not reached ideal precision largely due to the knowledge complexity in medical field. It is hard to describe the medical entity definitely in feature selection and current selected features are not rich and lack of semantic information. Besides that, different medical entities which have the similar characters more easily cause classification algorithm making wrong judgment. Combination multi classification algorithms such as SVM-KNN can overcome its own disadvantages and get higher performance. But current SVM-KNN classification algorithm may provide wrong categories results due to setting inappropriate K value and unbalanced examples distribution. In this paper, we introduce two-level modelling tool to help specialists to build semantic models and select features from them. We design and implement medical named entity recognition analysis engine based on UIMA framework and adopt improved SVM-KNN algorithm called EK-SVM-KNN (Extending K SVM-KNN) in classification. We collect experiment data from SLE(Systemic Lupus Erythematosus) clinical information system in Renji Hospital. We adopt 50 Pathology reports as training data and provide 1000 Pathology as test data. Experiment shows medical NER based on semantic model and improved SVM-KNN algorithm can enhance the quality of NER and we get the precision, recall rate and F value up to 86%.","PeriodicalId":184788,"journal":{"name":"2011 Seventh International Conference on Semantics, Knowledge and Grids","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Seventh International Conference on Semantics, Knowledge and Grids","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SKG.2011.24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21

Abstract

In the medical field, a lot of unstructured information which is expressed by natural language exists in medical literature, technical documentation and medical records. IE (Information Extraction) as one of the most important research directions in natural language process aims to help humans extract concerned information automatically. NER (Named Entity Recognition) is one of the subsystems of IE and has direct influence on the quality of IE. Nowadays NER of medical field has not reached ideal precision largely due to the knowledge complexity in medical field. It is hard to describe the medical entity definitely in feature selection and current selected features are not rich and lack of semantic information. Besides that, different medical entities which have the similar characters more easily cause classification algorithm making wrong judgment. Combination multi classification algorithms such as SVM-KNN can overcome its own disadvantages and get higher performance. But current SVM-KNN classification algorithm may provide wrong categories results due to setting inappropriate K value and unbalanced examples distribution. In this paper, we introduce two-level modelling tool to help specialists to build semantic models and select features from them. We design and implement medical named entity recognition analysis engine based on UIMA framework and adopt improved SVM-KNN algorithm called EK-SVM-KNN (Extending K SVM-KNN) in classification. We collect experiment data from SLE(Systemic Lupus Erythematosus) clinical information system in Renji Hospital. We adopt 50 Pathology reports as training data and provide 1000 Pathology as test data. Experiment shows medical NER based on semantic model and improved SVM-KNN algorithm can enhance the quality of NER and we get the precision, recall rate and F value up to 86%.
基于语义模型和改进SVM-KNN算法的医学命名实体识别方法
在医学领域,医学文献、技术文档和病历中存在大量以自然语言表达的非结构化信息。信息抽取(Information Extraction, IE)是自然语言处理中最重要的研究方向之一,旨在帮助人类自动抽取相关信息。命名实体识别(NER, Named Entity Recognition)是IE的一个子系统,直接影响到IE的质量。由于医学领域知识的复杂性,目前医学领域的NER并没有达到理想的精度。在特征选择上难以对医疗实体进行明确的描述,目前所选择的特征语义信息不够丰富和缺乏。此外,具有相似特征的不同医疗实体更容易导致分类算法做出错误判断。SVM-KNN等组合多分类算法可以克服自身的缺点,获得更高的性能。但是目前的SVM-KNN分类算法由于K值设置不合适,样本分布不平衡,可能会给出错误的分类结果。在本文中,我们引入了两级建模工具,以帮助专家建立语义模型并从中选择特征。我们设计并实现了基于UIMA框架的医学命名实体识别分析引擎,并在分类中采用改进的SVM-KNN算法EK-SVM-KNN (extended K SVM-KNN)。实验数据来源于仁济医院系统性红斑狼疮临床信息系统。我们采用50份病理学报告作为培训数据,提供1000份病理学作为测试数据。实验表明,基于语义模型的医学NER和改进的SVM-KNN算法可以提高NER的质量,准确率、召回率和F值均达到86%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信