FMODC: Fuzzy guided multi-objective document clustering by GA

A. Rao, S. Ramakrishna
{"title":"FMODC: Fuzzy guided multi-objective document clustering by GA","authors":"A. Rao, S. Ramakrishna","doi":"10.1109/IC3I.2016.7918043","DOIUrl":null,"url":null,"abstract":"In the unsuprevised learning method of text mining process that is in prevalence, the issues pertaining to multi dimensionality is turning out to be a major factor, as the clustering is not focusing on optimal evaluation of concept, context and semantic relevancy, which are also very essential in terms of clustering process. In majority of the models that are proposed earlier, the factors like the term frequency, were considered and the clustering has been focusing only on one factor of Semantic, whereas as context and conceptual factors also play a significant importance. In extension to the earlier model of MODC, and DC3SR, the proposed model of Multi-objective Distance based Optimal Document Clustering (MODC) by GA has been proposed in the study. Among the lessons that are learnt from the review of earlier models., the scope for fuzzy guided multi-objective optimal document clustering (FMODC) approach which shall support in more effective computation and clustering using the Genetic Algorithm is discussed in the case scenario. From the experimentation process that is focused in the study, using the meta-text data gathered from the same publisher, the model has been tested in comparative analysis to other two models, BADC and AC-DCO and the outcome in terms of optimum clustering that has been achieved with FMODC model depicts the kind of accuracy in the model and the system. An unsupervised learning approach to form the initial clusters that estimates similarity between any two documents by concept, context and semantic relevance score and further optimizes by fuzzy genetic algorithm is proposed. This novel method represents the concept as correlation between arguments and activities in given documents, context as correlation between meta-text of the documents and the semantic relevance is assessed by estimating the similarity between documents through the hyponyms of the arguments. The meta-text of the documents considered for context assessment contains the authors list, keywords list and list of document versioning time schedules. The experiments were conducted to assess the significance of the proposed model.","PeriodicalId":305971,"journal":{"name":"2016 2nd International Conference on Contemporary Computing and Informatics (IC3I)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 2nd International Conference on Contemporary Computing and Informatics (IC3I)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3I.2016.7918043","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In the unsuprevised learning method of text mining process that is in prevalence, the issues pertaining to multi dimensionality is turning out to be a major factor, as the clustering is not focusing on optimal evaluation of concept, context and semantic relevancy, which are also very essential in terms of clustering process. In majority of the models that are proposed earlier, the factors like the term frequency, were considered and the clustering has been focusing only on one factor of Semantic, whereas as context and conceptual factors also play a significant importance. In extension to the earlier model of MODC, and DC3SR, the proposed model of Multi-objective Distance based Optimal Document Clustering (MODC) by GA has been proposed in the study. Among the lessons that are learnt from the review of earlier models., the scope for fuzzy guided multi-objective optimal document clustering (FMODC) approach which shall support in more effective computation and clustering using the Genetic Algorithm is discussed in the case scenario. From the experimentation process that is focused in the study, using the meta-text data gathered from the same publisher, the model has been tested in comparative analysis to other two models, BADC and AC-DCO and the outcome in terms of optimum clustering that has been achieved with FMODC model depicts the kind of accuracy in the model and the system. An unsupervised learning approach to form the initial clusters that estimates similarity between any two documents by concept, context and semantic relevance score and further optimizes by fuzzy genetic algorithm is proposed. This novel method represents the concept as correlation between arguments and activities in given documents, context as correlation between meta-text of the documents and the semantic relevance is assessed by estimating the similarity between documents through the hyponyms of the arguments. The meta-text of the documents considered for context assessment contains the authors list, keywords list and list of document versioning time schedules. The experiments were conducted to assess the significance of the proposed model.
FMODC:基于遗传算法的模糊引导多目标文档聚类
在目前流行的文本挖掘过程的无监督学习方法中,由于聚类不关注概念、上下文和语义相关性的最优评价,多维度问题成为主要因素,而这些在聚类过程中也是非常重要的。在之前提出的大多数模型中,考虑了术语频率等因素,聚类只关注语义的一个因素,而上下文和概念因素也起着重要作用。本文在原有MODC模型和DC3SR模型的基础上,提出了基于遗传算法的多目标距离优化文档聚类(MODC)模型。从回顾早期模式中吸取的教训之一。,讨论了模糊引导多目标最优文档聚类(FMODC)方法的适用范围,该方法支持更有效的遗传算法计算和聚类。在本研究的实验过程中,使用同一出版商的元文本数据,对模型进行了与BADC和AC-DCO两种模型的对比分析,FMODC模型在最优聚类方面所取得的结果描述了模型和系统的准确性。提出了一种无监督学习方法,通过概念、上下文和语义关联评分来估计任意两个文档之间的相似度,并通过模糊遗传算法进一步优化。该方法将概念表示为给定文档中参数和活动之间的相关性,将上下文表示为文档元文本之间的相关性,并通过参数的下位词估计文档之间的相似性来评估语义相关性。用于上下文评估的文档的元文本包含作者列表、关键字列表和文档版本时间表列表。通过实验来评估所提出模型的意义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信