Clinical entity recognition using structural support vector machines with rich features

Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI:10.1145/2390068.2390073

Buzhou Tang, Yonghui Wu, Min Jiang, Hua Xu

{"title":"Clinical entity recognition using structural support vector machines with rich features","authors":"Buzhou Tang, Yonghui Wu, Min Jiang, Hua Xu","doi":"10.1145/2390068.2390073","DOIUrl":null,"url":null,"abstract":"Named entity recognition (NER) is an important task for natural language processing (NLP) of clinical text. Conditional Random Fields (CRFs), a sequential labeling algorithm, and Support Vector Machines (SVMs), which is based on large margin theory, are two typical machine learning algorithms that have been widely applied to NER tasks, including clinical entity recognition. However, Structural Support Vector Machines (SSVMs), an algorithm that combines the advantages of both CRFs and SVMs, has not been investigated for clinical text processing. In this study, we applied the SSVMs algorithm to the Concept Extraction task of the 2010 i2b2 clinical NLP challenge, which was to recognize entities of medical problems, treatments, and tests from hospital discharge summaries. Using the same training (N = 27,837) and test (N = 45,009) sets in the challenge, our evaluation showed that the SSVMs-based NER system required less training time, while achieved better performance than the CRFs-based system for clinical entity recognition, when same features were used. Our study also demonstrated that rich features such as unsupervised word representations improved the performance of clinical entity recognition. When rich features were integrated with SSVMs, our system achieved a highest F-measure of 85.74% on the test set of 2010 i2b2 NLP challenge, which outperformed the best system reported in the challenge by 0.5%.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"58","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data and Text Mining in Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2390068.2390073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 58

Abstract

Named entity recognition (NER) is an important task for natural language processing (NLP) of clinical text. Conditional Random Fields (CRFs), a sequential labeling algorithm, and Support Vector Machines (SVMs), which is based on large margin theory, are two typical machine learning algorithms that have been widely applied to NER tasks, including clinical entity recognition. However, Structural Support Vector Machines (SSVMs), an algorithm that combines the advantages of both CRFs and SVMs, has not been investigated for clinical text processing. In this study, we applied the SSVMs algorithm to the Concept Extraction task of the 2010 i2b2 clinical NLP challenge, which was to recognize entities of medical problems, treatments, and tests from hospital discharge summaries. Using the same training (N = 27,837) and test (N = 45,009) sets in the challenge, our evaluation showed that the SSVMs-based NER system required less training time, while achieved better performance than the CRFs-based system for clinical entity recognition, when same features were used. Our study also demonstrated that rich features such as unsupervised word representations improved the performance of clinical entity recognition. When rich features were integrated with SSVMs, our system achieved a highest F-measure of 85.74% on the test set of 2010 i2b2 NLP challenge, which outperformed the best system reported in the challenge by 0.5%.

查看原文本刊更多论文

特征丰富的结构支持向量机临床实体识别

命名实体识别(NER)是临床文本自然语言处理(NLP)的一项重要任务。条件随机场(CRFs)是一种顺序标注算法，而支持向量机(svm)是基于大余量理论的两种典型的机器学习算法，已广泛应用于NER任务，包括临床实体识别。然而，结合CRFs和svm优点的结构支持向量机(ssvm)算法尚未被研究用于临床文本处理。在本研究中，我们将ssvm算法应用于2010年i2b2临床NLP挑战的概念提取任务，该任务是从医院出院摘要中识别医疗问题、治疗和测试的实体。在挑战中使用相同的训练集(N = 27,837)和测试集(N = 45,009)，我们的评估表明，当使用相同的特征时，基于ssvm的NER系统所需的训练时间更少，但在临床实体识别方面取得了比基于crfs的系统更好的性能。我们的研究还表明，丰富的特征，如无监督的词表示，提高了临床实体识别的性能。当丰富的特征与ssvm集成时，我们的系统在2010年i2b2 NLP挑战的测试集上达到了85.74%的最高f值，比挑战中报告的最佳系统高出0.5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Data and Text Mining in Bioinformatics

自引率

0.00%

发文量