Yunsoo Choi, Sa-kwang Song, H. Chun, Chang-Hoo Jeong, Sung-Pil Choi
{"title":"基于机器学习的科学文献分析术语识别系统","authors":"Yunsoo Choi, Sa-kwang Song, H. Chun, Chang-Hoo Jeong, Sung-Pil Choi","doi":"10.3745/KIPSTD.2011.18D.5.329","DOIUrl":null,"url":null,"abstract":"Terminology recognition system which is a preceding research for text mining, information extraction, information retrieval, semantic web, and question-answering has been intensively studied in limited range of domains, especially in bio-medical domain. We propose a domain independent terminology recognition system based on machine learning method using dictionary, syntactic features, and Web search results, since the previous works revealed limitation on applying their approaches to general domain because their resources were domain specific. We achieved F-score 80.8 and 6.5% improvement after comparing the proposed approach with the related approach, C-value, which has been widely used and is based on local domain frequencies. In the second experiment with various combinations of unithood features, the method combined with NGD(Normalized Google Distance) showed the best performance of 81.8 on F-score. We applied three machine learning methods such as Logistic regression, C4.5, and SVMs, and got the best score from the decision tree method, C4.5.","PeriodicalId":348746,"journal":{"name":"The Kips Transactions:partd","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Terminology Recognition System based on Machine Learning for Scientific Document Analysis\",\"authors\":\"Yunsoo Choi, Sa-kwang Song, H. Chun, Chang-Hoo Jeong, Sung-Pil Choi\",\"doi\":\"10.3745/KIPSTD.2011.18D.5.329\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Terminology recognition system which is a preceding research for text mining, information extraction, information retrieval, semantic web, and question-answering has been intensively studied in limited range of domains, especially in bio-medical domain. We propose a domain independent terminology recognition system based on machine learning method using dictionary, syntactic features, and Web search results, since the previous works revealed limitation on applying their approaches to general domain because their resources were domain specific. We achieved F-score 80.8 and 6.5% improvement after comparing the proposed approach with the related approach, C-value, which has been widely used and is based on local domain frequencies. In the second experiment with various combinations of unithood features, the method combined with NGD(Normalized Google Distance) showed the best performance of 81.8 on F-score. We applied three machine learning methods such as Logistic regression, C4.5, and SVMs, and got the best score from the decision tree method, C4.5.\",\"PeriodicalId\":348746,\"journal\":{\"name\":\"The Kips Transactions:partd\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Kips Transactions:partd\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3745/KIPSTD.2011.18D.5.329\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Kips Transactions:partd","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3745/KIPSTD.2011.18D.5.329","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Terminology Recognition System based on Machine Learning for Scientific Document Analysis
Terminology recognition system which is a preceding research for text mining, information extraction, information retrieval, semantic web, and question-answering has been intensively studied in limited range of domains, especially in bio-medical domain. We propose a domain independent terminology recognition system based on machine learning method using dictionary, syntactic features, and Web search results, since the previous works revealed limitation on applying their approaches to general domain because their resources were domain specific. We achieved F-score 80.8 and 6.5% improvement after comparing the proposed approach with the related approach, C-value, which has been widely used and is based on local domain frequencies. In the second experiment with various combinations of unithood features, the method combined with NGD(Normalized Google Distance) showed the best performance of 81.8 on F-score. We applied three machine learning methods such as Logistic regression, C4.5, and SVMs, and got the best score from the decision tree method, C4.5.