{"title":"DCCSR: Document Clustering by Conceptual and Semantic Relevance as Factors of Unsupervised Learning","authors":"A. Rao, S. Ramakrishna","doi":"10.1145/2979779.2979822","DOIUrl":null,"url":null,"abstract":"Unsupervised learning of text documents is an essential and significant process of knowledge discovery and data mining. The concept, context and semantic relevancy are the important and exclusive factors in text mining, where as in the case of unsupervised learning of record structured data, these factors are not in scope. The current majority of benchmarking document clustering models is keen and relies on term frequency, and all these models are not considering the concept, context and semantic relations during document clustering. In regard to this, our earlier work introduced a novel document clustering approach that named as Document Clustering by Conceptual Relevance (DCCR), which is aimed at concept relevancy. With the lessons learned from the empirical study of the DCCR, here we presented a novel document clustering approach, which is based on concept and semantic relevancy of the documents. The significant contribution of this proposal is feature formation by concept and semantic relevance. An unsupervised learning approach that estimates similarity between any two documents by concept and semantic relevance score is proposed. This novel method represents the concept as correlation between arguments and activities in given documents, and the semantic relevance is assessed by estimating the similarity between documents through the hyponyms of the arguments. The experiments was conducted to assess the significance of the proposed model and in regard to this, the benchmark datasets were used.","PeriodicalId":298730,"journal":{"name":"Proceedings of the International Conference on Advances in Information Communication Technology & Computing","volume":"115 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on Advances in Information Communication Technology & Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2979779.2979822","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Unsupervised learning of text documents is an essential and significant process of knowledge discovery and data mining. The concept, context and semantic relevancy are the important and exclusive factors in text mining, where as in the case of unsupervised learning of record structured data, these factors are not in scope. The current majority of benchmarking document clustering models is keen and relies on term frequency, and all these models are not considering the concept, context and semantic relations during document clustering. In regard to this, our earlier work introduced a novel document clustering approach that named as Document Clustering by Conceptual Relevance (DCCR), which is aimed at concept relevancy. With the lessons learned from the empirical study of the DCCR, here we presented a novel document clustering approach, which is based on concept and semantic relevancy of the documents. The significant contribution of this proposal is feature formation by concept and semantic relevance. An unsupervised learning approach that estimates similarity between any two documents by concept and semantic relevance score is proposed. This novel method represents the concept as correlation between arguments and activities in given documents, and the semantic relevance is assessed by estimating the similarity between documents through the hyponyms of the arguments. The experiments was conducted to assess the significance of the proposed model and in regard to this, the benchmark datasets were used.