{"title":"Term recognition using Conditional Random fields","authors":"Xing Zhang, Yan Song, A. Fang","doi":"10.1109/NLPKE.2010.5587809","DOIUrl":null,"url":null,"abstract":"A machine learning framework, Conditional Random fields (CRF), is constructed in this study, which exploits syntactic information to recognize biomedical terms. Features used in this CRF framework focus on syntactic information in different levels, including parent nodes, syntactic functions, syntactic paths and term ratios. A series of experiments have been done to study the effects of training sizes, general term recognition and novel term recognition. The experiment results show that features as syntactic paths and term ratios can achieve good precision of term recognition, including both general terms and novel terms. However, the recall of novel term recognition is still unsatisfactory, which calls for more effective features to be used. All in all, as this research studies in depth the uses of some unique syntactic features, it is innovative in respect of constructing machine learning based term recognition system.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NLPKE.2010.5587809","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
A machine learning framework, Conditional Random fields (CRF), is constructed in this study, which exploits syntactic information to recognize biomedical terms. Features used in this CRF framework focus on syntactic information in different levels, including parent nodes, syntactic functions, syntactic paths and term ratios. A series of experiments have been done to study the effects of training sizes, general term recognition and novel term recognition. The experiment results show that features as syntactic paths and term ratios can achieve good precision of term recognition, including both general terms and novel terms. However, the recall of novel term recognition is still unsatisfactory, which calls for more effective features to be used. All in all, as this research studies in depth the uses of some unique syntactic features, it is innovative in respect of constructing machine learning based term recognition system.
本研究构建了一个机器学习框架条件随机场(Conditional Random fields, CRF),该框架利用句法信息来识别生物医学术语。该CRF框架中使用的功能侧重于不同层次的语法信息,包括父节点、语法函数、语法路径和词比。通过一系列实验研究了训练大小、通用术语识别和新术语识别的影响。实验结果表明,句法路径特征和词项比例特征都能达到较好的词项识别精度,既包括通用词项,也包括新词项。然而,新语项识别的召回率仍然不理想,这需要使用更有效的特征。总而言之,由于本研究深入研究了一些独特的句法特征的使用,因此在构建基于机器学习的术语识别系统方面具有创新性。