FMODC: Fuzzy guided multi-objective document clustering by GA

2016 2nd International Conference on Contemporary Computing and Informatics (IC3I) Pub Date : 2016-12-01 DOI:10.1109/IC3I.2016.7918043

A. Rao, S. Ramakrishna

{"title":"FMODC: Fuzzy guided multi-objective document clustering by GA","authors":"A. Rao, S. Ramakrishna","doi":"10.1109/IC3I.2016.7918043","DOIUrl":null,"url":null,"abstract":"In the unsuprevised learning method of text mining process that is in prevalence, the issues pertaining to multi dimensionality is turning out to be a major factor, as the clustering is not focusing on optimal evaluation of concept, context and semantic relevancy, which are also very essential in terms of clustering process. In majority of the models that are proposed earlier, the factors like the term frequency, were considered and the clustering has been focusing only on one factor of Semantic, whereas as context and conceptual factors also play a significant importance. In extension to the earlier model of MODC, and DC3SR, the proposed model of Multi-objective Distance based Optimal Document Clustering (MODC) by GA has been proposed in the study. Among the lessons that are learnt from the review of earlier models., the scope for fuzzy guided multi-objective optimal document clustering (FMODC) approach which shall support in more effective computation and clustering using the Genetic Algorithm is discussed in the case scenario. From the experimentation process that is focused in the study, using the meta-text data gathered from the same publisher, the model has been tested in comparative analysis to other two models, BADC and AC-DCO and the outcome in terms of optimum clustering that has been achieved with FMODC model depicts the kind of accuracy in the model and the system. An unsupervised learning approach to form the initial clusters that estimates similarity between any two documents by concept, context and semantic relevance score and further optimizes by fuzzy genetic algorithm is proposed. This novel method represents the concept as correlation between arguments and activities in given documents, context as correlation between meta-text of the documents and the semantic relevance is assessed by estimating the similarity between documents through the hyponyms of the arguments. The meta-text of the documents considered for context assessment contains the authors list, keywords list and list of document versioning time schedules. The experiments were conducted to assess the significance of the proposed model.","PeriodicalId":305971,"journal":{"name":"2016 2nd International Conference on Contemporary Computing and Informatics (IC3I)","volume":"185 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 2nd International Conference on Contemporary Computing and Informatics (IC3I)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3I.2016.7918043","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In the unsuprevised learning method of text mining process that is in prevalence, the issues pertaining to multi dimensionality is turning out to be a major factor, as the clustering is not focusing on optimal evaluation of concept, context and semantic relevancy, which are also very essential in terms of clustering process. In majority of the models that are proposed earlier, the factors like the term frequency, were considered and the clustering has been focusing only on one factor of Semantic, whereas as context and conceptual factors also play a significant importance. In extension to the earlier model of MODC, and DC3SR, the proposed model of Multi-objective Distance based Optimal Document Clustering (MODC) by GA has been proposed in the study. Among the lessons that are learnt from the review of earlier models., the scope for fuzzy guided multi-objective optimal document clustering (FMODC) approach which shall support in more effective computation and clustering using the Genetic Algorithm is discussed in the case scenario. From the experimentation process that is focused in the study, using the meta-text data gathered from the same publisher, the model has been tested in comparative analysis to other two models, BADC and AC-DCO and the outcome in terms of optimum clustering that has been achieved with FMODC model depicts the kind of accuracy in the model and the system. An unsupervised learning approach to form the initial clusters that estimates similarity between any two documents by concept, context and semantic relevance score and further optimizes by fuzzy genetic algorithm is proposed. This novel method represents the concept as correlation between arguments and activities in given documents, context as correlation between meta-text of the documents and the semantic relevance is assessed by estimating the similarity between documents through the hyponyms of the arguments. The meta-text of the documents considered for context assessment contains the authors list, keywords list and list of document versioning time schedules. The experiments were conducted to assess the significance of the proposed model.

查看原文本刊更多论文

FMODC:基于遗传算法的模糊引导多目标文档聚类

在目前流行的文本挖掘过程的无监督学习方法中，由于聚类不关注概念、上下文和语义相关性的最优评价，多维度问题成为主要因素，而这些在聚类过程中也是非常重要的。在之前提出的大多数模型中，考虑了术语频率等因素，聚类只关注语义的一个因素，而上下文和概念因素也起着重要作用。本文在原有MODC模型和DC3SR模型的基础上，提出了基于遗传算法的多目标距离优化文档聚类(MODC)模型。从回顾早期模式中吸取的教训之一。，讨论了模糊引导多目标最优文档聚类(FMODC)方法的适用范围，该方法支持更有效的遗传算法计算和聚类。在本研究的实验过程中，使用同一出版商的元文本数据，对模型进行了与BADC和AC-DCO两种模型的对比分析，FMODC模型在最优聚类方面所取得的结果描述了模型和系统的准确性。提出了一种无监督学习方法，通过概念、上下文和语义关联评分来估计任意两个文档之间的相似度，并通过模糊遗传算法进一步优化。该方法将概念表示为给定文档中参数和活动之间的相关性，将上下文表示为文档元文本之间的相关性，并通过参数的下位词估计文档之间的相似性来评估语义相关性。用于上下文评估的文档的元文本包含作者列表、关键字列表和文档版本时间表列表。通过实验来评估所提出模型的意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 2nd International Conference on Contemporary Computing and Informatics (IC3I)

自引率

0.00%

发文量