Meng Tian, Jianqiang Li, Jijiang Yang, Bo Liu, Xi Meng, Ronghua Li, J. Bi
{"title":"Exploiting collaborative learning for concept extraction in the medical field","authors":"Meng Tian, Jianqiang Li, Jijiang Yang, Bo Liu, Xi Meng, Ronghua Li, J. Bi","doi":"10.1145/3018009.3018054","DOIUrl":null,"url":null,"abstract":"With the increasing interests of second use of medical data, concept extraction in Electronic Medical Records has drawn more and more scholars' attention. Owing to the artificial data annotation task is labor intensive, the method of concept extraction is mainly to use the fully labeled documents as training data in order to build a concept instance identifier. However, in many cases, the available training data are sparse labeling. This fact makes the performance of the constructed classifier is poor. Existing methods for extracting concepts either considered the diversity of datasets or considered the various learning models. Therefore, this paper proposes a novel approach to improve the performance of concept extraction from electronic medical records by combining the diversity of datasets with the various learning models. The large sparsely labeled dataset is split into multiple subsets. Then the different subsets are trained by different learning models, such as HMM, MEMM, and CRF, in an iterative way. Our technique leverages off the fact that different learning algorithms have different inductive biases and that better predictions can be made by the voted majority.","PeriodicalId":189252,"journal":{"name":"Proceedings of the 2nd International Conference on Communication and Information Processing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd International Conference on Communication and Information Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3018009.3018054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the increasing interests of second use of medical data, concept extraction in Electronic Medical Records has drawn more and more scholars' attention. Owing to the artificial data annotation task is labor intensive, the method of concept extraction is mainly to use the fully labeled documents as training data in order to build a concept instance identifier. However, in many cases, the available training data are sparse labeling. This fact makes the performance of the constructed classifier is poor. Existing methods for extracting concepts either considered the diversity of datasets or considered the various learning models. Therefore, this paper proposes a novel approach to improve the performance of concept extraction from electronic medical records by combining the diversity of datasets with the various learning models. The large sparsely labeled dataset is split into multiple subsets. Then the different subsets are trained by different learning models, such as HMM, MEMM, and CRF, in an iterative way. Our technique leverages off the fact that different learning algorithms have different inductive biases and that better predictions can be made by the voted majority.