Exploiting collaborative learning for concept extraction in the medical field

Proceedings of the 2nd International Conference on Communication and Information Processing Pub Date : 2016-11-26 DOI:10.1145/3018009.3018054

Meng Tian, Jianqiang Li, Jijiang Yang, Bo Liu, Xi Meng, Ronghua Li, J. Bi

{"title":"Exploiting collaborative learning for concept extraction in the medical field","authors":"Meng Tian, Jianqiang Li, Jijiang Yang, Bo Liu, Xi Meng, Ronghua Li, J. Bi","doi":"10.1145/3018009.3018054","DOIUrl":null,"url":null,"abstract":"With the increasing interests of second use of medical data, concept extraction in Electronic Medical Records has drawn more and more scholars' attention. Owing to the artificial data annotation task is labor intensive, the method of concept extraction is mainly to use the fully labeled documents as training data in order to build a concept instance identifier. However, in many cases, the available training data are sparse labeling. This fact makes the performance of the constructed classifier is poor. Existing methods for extracting concepts either considered the diversity of datasets or considered the various learning models. Therefore, this paper proposes a novel approach to improve the performance of concept extraction from electronic medical records by combining the diversity of datasets with the various learning models. The large sparsely labeled dataset is split into multiple subsets. Then the different subsets are trained by different learning models, such as HMM, MEMM, and CRF, in an iterative way. Our technique leverages off the fact that different learning algorithms have different inductive biases and that better predictions can be made by the voted majority.","PeriodicalId":189252,"journal":{"name":"Proceedings of the 2nd International Conference on Communication and Information Processing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd International Conference on Communication and Information Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3018009.3018054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

With the increasing interests of second use of medical data, concept extraction in Electronic Medical Records has drawn more and more scholars' attention. Owing to the artificial data annotation task is labor intensive, the method of concept extraction is mainly to use the fully labeled documents as training data in order to build a concept instance identifier. However, in many cases, the available training data are sparse labeling. This fact makes the performance of the constructed classifier is poor. Existing methods for extracting concepts either considered the diversity of datasets or considered the various learning models. Therefore, this paper proposes a novel approach to improve the performance of concept extraction from electronic medical records by combining the diversity of datasets with the various learning models. The large sparsely labeled dataset is split into multiple subsets. Then the different subsets are trained by different learning models, such as HMM, MEMM, and CRF, in an iterative way. Our technique leverages off the fact that different learning algorithms have different inductive biases and that better predictions can be made by the voted majority.

查看原文本刊更多论文

协同学习在医学领域概念提取中的应用

随着人们对医疗数据二次利用的兴趣日益浓厚，电子病历中的概念提取受到越来越多学者的关注。由于人工数据标注任务是劳动密集型的，概念抽取的方法主要是使用完全标注的文档作为训练数据来构建概念实例标识符。然而，在许多情况下，可用的训练数据是稀疏标记的。这一事实使得构造的分类器的性能很差。现有的概念提取方法要么考虑数据集的多样性，要么考虑各种学习模型。因此，本文提出了一种新的方法，将数据集的多样性与各种学习模型相结合，以提高电子病历概念提取的性能。将大型稀疏标记数据集分成多个子集。然后使用HMM、MEMM和CRF等不同的学习模型对不同的子集进行迭代训练。我们的技术利用了这样一个事实，即不同的学习算法有不同的归纳偏差，并且通过投票多数可以做出更好的预测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2nd International Conference on Communication and Information Processing

自引率

0.00%

发文量