Watcharachat Plangsri, Nalina Phisanbut, P. Piamsa-nga
{"title":"从大量研究文献的语料库中进行无监督概念识别","authors":"Watcharachat Plangsri, Nalina Phisanbut, P. Piamsa-nga","doi":"10.1109/KST53302.2022.9729060","DOIUrl":null,"url":null,"abstract":"Research documents play a crucial role in data-driven research. Identifying concepts in a corpus of research documents can lead to a better understanding of the current stage of research. It can reveal fruitful concepts hidden inside the corpus. However, manually analyzing the corpus is laborious and inefficient. Automating the process is challenging due to the lack of background knowledge to fill the semantic gap that exists between humans and machines. To address this issue, we introduce a novel method that leverages information from an online open resource, namely Wikipedia, to build background knowledge automatically. An experiment on a set of 13,636 research documents shows that the framework can effectively and efficiently identify broad range of concepts within a large text corpus by exploiting only Wikipedia categories and documents' titles.","PeriodicalId":433638,"journal":{"name":"2022 14th International Conference on Knowledge and Smart Technology (KST)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unsupervised concept identification from a large corpus of research documents\",\"authors\":\"Watcharachat Plangsri, Nalina Phisanbut, P. Piamsa-nga\",\"doi\":\"10.1109/KST53302.2022.9729060\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Research documents play a crucial role in data-driven research. Identifying concepts in a corpus of research documents can lead to a better understanding of the current stage of research. It can reveal fruitful concepts hidden inside the corpus. However, manually analyzing the corpus is laborious and inefficient. Automating the process is challenging due to the lack of background knowledge to fill the semantic gap that exists between humans and machines. To address this issue, we introduce a novel method that leverages information from an online open resource, namely Wikipedia, to build background knowledge automatically. An experiment on a set of 13,636 research documents shows that the framework can effectively and efficiently identify broad range of concepts within a large text corpus by exploiting only Wikipedia categories and documents' titles.\",\"PeriodicalId\":433638,\"journal\":{\"name\":\"2022 14th International Conference on Knowledge and Smart Technology (KST)\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 14th International Conference on Knowledge and Smart Technology (KST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/KST53302.2022.9729060\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Knowledge and Smart Technology (KST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KST53302.2022.9729060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Unsupervised concept identification from a large corpus of research documents
Research documents play a crucial role in data-driven research. Identifying concepts in a corpus of research documents can lead to a better understanding of the current stage of research. It can reveal fruitful concepts hidden inside the corpus. However, manually analyzing the corpus is laborious and inefficient. Automating the process is challenging due to the lack of background knowledge to fill the semantic gap that exists between humans and machines. To address this issue, we introduce a novel method that leverages information from an online open resource, namely Wikipedia, to build background knowledge automatically. An experiment on a set of 13,636 research documents shows that the framework can effectively and efficiently identify broad range of concepts within a large text corpus by exploiting only Wikipedia categories and documents' titles.