Clinical entity recognition using structural support vector machines with rich features

Buzhou Tang, Yonghui Wu, Min Jiang, Hua Xu
{"title":"Clinical entity recognition using structural support vector machines with rich features","authors":"Buzhou Tang, Yonghui Wu, Min Jiang, Hua Xu","doi":"10.1145/2390068.2390073","DOIUrl":null,"url":null,"abstract":"Named entity recognition (NER) is an important task for natural language processing (NLP) of clinical text. Conditional Random Fields (CRFs), a sequential labeling algorithm, and Support Vector Machines (SVMs), which is based on large margin theory, are two typical machine learning algorithms that have been widely applied to NER tasks, including clinical entity recognition. However, Structural Support Vector Machines (SSVMs), an algorithm that combines the advantages of both CRFs and SVMs, has not been investigated for clinical text processing. In this study, we applied the SSVMs algorithm to the Concept Extraction task of the 2010 i2b2 clinical NLP challenge, which was to recognize entities of medical problems, treatments, and tests from hospital discharge summaries. Using the same training (N = 27,837) and test (N = 45,009) sets in the challenge, our evaluation showed that the SSVMs-based NER system required less training time, while achieved better performance than the CRFs-based system for clinical entity recognition, when same features were used. Our study also demonstrated that rich features such as unsupervised word representations improved the performance of clinical entity recognition. When rich features were integrated with SSVMs, our system achieved a highest F-measure of 85.74% on the test set of 2010 i2b2 NLP challenge, which outperformed the best system reported in the challenge by 0.5%.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"58","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data and Text Mining in Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2390068.2390073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 58

Abstract

Named entity recognition (NER) is an important task for natural language processing (NLP) of clinical text. Conditional Random Fields (CRFs), a sequential labeling algorithm, and Support Vector Machines (SVMs), which is based on large margin theory, are two typical machine learning algorithms that have been widely applied to NER tasks, including clinical entity recognition. However, Structural Support Vector Machines (SSVMs), an algorithm that combines the advantages of both CRFs and SVMs, has not been investigated for clinical text processing. In this study, we applied the SSVMs algorithm to the Concept Extraction task of the 2010 i2b2 clinical NLP challenge, which was to recognize entities of medical problems, treatments, and tests from hospital discharge summaries. Using the same training (N = 27,837) and test (N = 45,009) sets in the challenge, our evaluation showed that the SSVMs-based NER system required less training time, while achieved better performance than the CRFs-based system for clinical entity recognition, when same features were used. Our study also demonstrated that rich features such as unsupervised word representations improved the performance of clinical entity recognition. When rich features were integrated with SSVMs, our system achieved a highest F-measure of 85.74% on the test set of 2010 i2b2 NLP challenge, which outperformed the best system reported in the challenge by 0.5%.
特征丰富的结构支持向量机临床实体识别
命名实体识别(NER)是临床文本自然语言处理(NLP)的一项重要任务。条件随机场(CRFs)是一种顺序标注算法,而支持向量机(svm)是基于大余量理论的两种典型的机器学习算法,已广泛应用于NER任务,包括临床实体识别。然而,结合CRFs和svm优点的结构支持向量机(ssvm)算法尚未被研究用于临床文本处理。在本研究中,我们将ssvm算法应用于2010年i2b2临床NLP挑战的概念提取任务,该任务是从医院出院摘要中识别医疗问题、治疗和测试的实体。在挑战中使用相同的训练集(N = 27,837)和测试集(N = 45,009),我们的评估表明,当使用相同的特征时,基于ssvm的NER系统所需的训练时间更少,但在临床实体识别方面取得了比基于crfs的系统更好的性能。我们的研究还表明,丰富的特征,如无监督的词表示,提高了临床实体识别的性能。当丰富的特征与ssvm集成时,我们的系统在2010年i2b2 NLP挑战的测试集上达到了85.74%的最高f值,比挑战中报告的最佳系统高出0.5%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信