{"title":"A semi-automatic method for extracting a taxonomy for nuclear knowledge using hierarchical document clustering based on concept sets","authors":"F. Braga, N. Ebecken","doi":"10.1504/IJNKM.2013.054496","DOIUrl":null,"url":null,"abstract":"In this paper, we present a text mining approach for the semiautomatic extraction of taxonomy of concepts for nuclear knowledge and evaluate the achievable results. Taxonomies are a fundamental part of any knowledge management strategy or framework. We propose a method for hierarchical document clustering based on the notion of frequent concept sets. Most clustering algorithms treat documents as a bag of words and bypass the important relationships between words, such as synonyms. In this method, we consider the semantic relationship between words and use a domain thesaurus (ETDE/INIS) to identify concepts. To validate the method, we conducted a case study in which we implemented a prototype, generating a taxonomy for nuclear knowledge with the goal of conceptually mapping the scientific production of the Brazilian Nuclear Energy Commission (CNEN).","PeriodicalId":188437,"journal":{"name":"International Journal of Nuclear Knowledge Management","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Nuclear Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJNKM.2013.054496","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
In this paper, we present a text mining approach for the semiautomatic extraction of taxonomy of concepts for nuclear knowledge and evaluate the achievable results. Taxonomies are a fundamental part of any knowledge management strategy or framework. We propose a method for hierarchical document clustering based on the notion of frequent concept sets. Most clustering algorithms treat documents as a bag of words and bypass the important relationships between words, such as synonyms. In this method, we consider the semantic relationship between words and use a domain thesaurus (ETDE/INIS) to identify concepts. To validate the method, we conducted a case study in which we implemented a prototype, generating a taxonomy for nuclear knowledge with the goal of conceptually mapping the scientific production of the Brazilian Nuclear Energy Commission (CNEN).