基于机器学习的科学文献分析术语识别系统

Yunsoo Choi, Sa-kwang Song, H. Chun, Chang-Hoo Jeong, Sung-Pil Choi
{"title":"基于机器学习的科学文献分析术语识别系统","authors":"Yunsoo Choi, Sa-kwang Song, H. Chun, Chang-Hoo Jeong, Sung-Pil Choi","doi":"10.3745/KIPSTD.2011.18D.5.329","DOIUrl":null,"url":null,"abstract":"Terminology recognition system which is a preceding research for text mining, information extraction, information retrieval, semantic web, and question-answering has been intensively studied in limited range of domains, especially in bio-medical domain. We propose a domain independent terminology recognition system based on machine learning method using dictionary, syntactic features, and Web search results, since the previous works revealed limitation on applying their approaches to general domain because their resources were domain specific. We achieved F-score 80.8 and 6.5% improvement after comparing the proposed approach with the related approach, C-value, which has been widely used and is based on local domain frequencies. In the second experiment with various combinations of unithood features, the method combined with NGD(Normalized Google Distance) showed the best performance of 81.8 on F-score. We applied three machine learning methods such as Logistic regression, C4.5, and SVMs, and got the best score from the decision tree method, C4.5.","PeriodicalId":348746,"journal":{"name":"The Kips Transactions:partd","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Terminology Recognition System based on Machine Learning for Scientific Document Analysis\",\"authors\":\"Yunsoo Choi, Sa-kwang Song, H. Chun, Chang-Hoo Jeong, Sung-Pil Choi\",\"doi\":\"10.3745/KIPSTD.2011.18D.5.329\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Terminology recognition system which is a preceding research for text mining, information extraction, information retrieval, semantic web, and question-answering has been intensively studied in limited range of domains, especially in bio-medical domain. We propose a domain independent terminology recognition system based on machine learning method using dictionary, syntactic features, and Web search results, since the previous works revealed limitation on applying their approaches to general domain because their resources were domain specific. We achieved F-score 80.8 and 6.5% improvement after comparing the proposed approach with the related approach, C-value, which has been widely used and is based on local domain frequencies. In the second experiment with various combinations of unithood features, the method combined with NGD(Normalized Google Distance) showed the best performance of 81.8 on F-score. We applied three machine learning methods such as Logistic regression, C4.5, and SVMs, and got the best score from the decision tree method, C4.5.\",\"PeriodicalId\":348746,\"journal\":{\"name\":\"The Kips Transactions:partd\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Kips Transactions:partd\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3745/KIPSTD.2011.18D.5.329\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Kips Transactions:partd","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3745/KIPSTD.2011.18D.5.329","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

术语识别系统作为文本挖掘、信息提取、信息检索、语义网、问答等领域的前沿研究,在有限的领域,特别是生物医学领域得到了广泛的研究。鉴于以往的研究表明,由于其资源是特定于领域的,因此无法将其应用于一般领域,因此我们提出了一种基于机器学习方法的领域独立术语识别系统,该系统使用字典、语法特征和Web搜索结果。与目前广泛使用的基于本域频率的c值方法相比,我们的f值得分为80.8,提高了6.5%。在不同单元特征组合的第二次实验中,结合NGD(归一化谷歌距离)的方法在F-score上的表现最好,为81.8。我们应用了Logistic回归、C4.5和svm三种机器学习方法,其中决策树方法C4.5得分最高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Terminology Recognition System based on Machine Learning for Scientific Document Analysis
Terminology recognition system which is a preceding research for text mining, information extraction, information retrieval, semantic web, and question-answering has been intensively studied in limited range of domains, especially in bio-medical domain. We propose a domain independent terminology recognition system based on machine learning method using dictionary, syntactic features, and Web search results, since the previous works revealed limitation on applying their approaches to general domain because their resources were domain specific. We achieved F-score 80.8 and 6.5% improvement after comparing the proposed approach with the related approach, C-value, which has been widely used and is based on local domain frequencies. In the second experiment with various combinations of unithood features, the method combined with NGD(Normalized Google Distance) showed the best performance of 81.8 on F-score. We applied three machine learning methods such as Logistic regression, C4.5, and SVMs, and got the best score from the decision tree method, C4.5.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信