{"title":"一种基于概念集的分层文档聚类半自动提取核知识分类的方法","authors":"F. Braga, N. Ebecken","doi":"10.1504/IJNKM.2013.054496","DOIUrl":null,"url":null,"abstract":"In this paper, we present a text mining approach for the semiautomatic extraction of taxonomy of concepts for nuclear knowledge and evaluate the achievable results. Taxonomies are a fundamental part of any knowledge management strategy or framework. We propose a method for hierarchical document clustering based on the notion of frequent concept sets. Most clustering algorithms treat documents as a bag of words and bypass the important relationships between words, such as synonyms. In this method, we consider the semantic relationship between words and use a domain thesaurus (ETDE/INIS) to identify concepts. To validate the method, we conducted a case study in which we implemented a prototype, generating a taxonomy for nuclear knowledge with the goal of conceptually mapping the scientific production of the Brazilian Nuclear Energy Commission (CNEN).","PeriodicalId":188437,"journal":{"name":"International Journal of Nuclear Knowledge Management","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A semi-automatic method for extracting a taxonomy for nuclear knowledge using hierarchical document clustering based on concept sets\",\"authors\":\"F. Braga, N. Ebecken\",\"doi\":\"10.1504/IJNKM.2013.054496\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present a text mining approach for the semiautomatic extraction of taxonomy of concepts for nuclear knowledge and evaluate the achievable results. Taxonomies are a fundamental part of any knowledge management strategy or framework. We propose a method for hierarchical document clustering based on the notion of frequent concept sets. Most clustering algorithms treat documents as a bag of words and bypass the important relationships between words, such as synonyms. In this method, we consider the semantic relationship between words and use a domain thesaurus (ETDE/INIS) to identify concepts. To validate the method, we conducted a case study in which we implemented a prototype, generating a taxonomy for nuclear knowledge with the goal of conceptually mapping the scientific production of the Brazilian Nuclear Energy Commission (CNEN).\",\"PeriodicalId\":188437,\"journal\":{\"name\":\"International Journal of Nuclear Knowledge Management\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Nuclear Knowledge Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/IJNKM.2013.054496\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Nuclear Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJNKM.2013.054496","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A semi-automatic method for extracting a taxonomy for nuclear knowledge using hierarchical document clustering based on concept sets
In this paper, we present a text mining approach for the semiautomatic extraction of taxonomy of concepts for nuclear knowledge and evaluate the achievable results. Taxonomies are a fundamental part of any knowledge management strategy or framework. We propose a method for hierarchical document clustering based on the notion of frequent concept sets. Most clustering algorithms treat documents as a bag of words and bypass the important relationships between words, such as synonyms. In this method, we consider the semantic relationship between words and use a domain thesaurus (ETDE/INIS) to identify concepts. To validate the method, we conducted a case study in which we implemented a prototype, generating a taxonomy for nuclear knowledge with the goal of conceptually mapping the scientific production of the Brazilian Nuclear Energy Commission (CNEN).