{"title":"FMODC: Fuzzy guided multi-objective document clustering by GA","authors":"A. Rao, S. Ramakrishna","doi":"10.1109/IC3I.2016.7918043","DOIUrl":null,"url":null,"abstract":"In the unsuprevised learning method of text mining process that is in prevalence, the issues pertaining to multi dimensionality is turning out to be a major factor, as the clustering is not focusing on optimal evaluation of concept, context and semantic relevancy, which are also very essential in terms of clustering process. In majority of the models that are proposed earlier, the factors like the term frequency, were considered and the clustering has been focusing only on one factor of Semantic, whereas as context and conceptual factors also play a significant importance. In extension to the earlier model of MODC, and DC3SR, the proposed model of Multi-objective Distance based Optimal Document Clustering (MODC) by GA has been proposed in the study. Among the lessons that are learnt from the review of earlier models., the scope for fuzzy guided multi-objective optimal document clustering (FMODC) approach which shall support in more effective computation and clustering using the Genetic Algorithm is discussed in the case scenario. From the experimentation process that is focused in the study, using the meta-text data gathered from the same publisher, the model has been tested in comparative analysis to other two models, BADC and AC-DCO and the outcome in terms of optimum clustering that has been achieved with FMODC model depicts the kind of accuracy in the model and the system. An unsupervised learning approach to form the initial clusters that estimates similarity between any two documents by concept, context and semantic relevance score and further optimizes by fuzzy genetic algorithm is proposed. This novel method represents the concept as correlation between arguments and activities in given documents, context as correlation between meta-text of the documents and the semantic relevance is assessed by estimating the similarity between documents through the hyponyms of the arguments. The meta-text of the documents considered for context assessment contains the authors list, keywords list and list of document versioning time schedules. The experiments were conducted to assess the significance of the proposed model.","PeriodicalId":305971,"journal":{"name":"2016 2nd International Conference on Contemporary Computing and Informatics (IC3I)","volume":"185 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 2nd International Conference on Contemporary Computing and Informatics (IC3I)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3I.2016.7918043","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In the unsuprevised learning method of text mining process that is in prevalence, the issues pertaining to multi dimensionality is turning out to be a major factor, as the clustering is not focusing on optimal evaluation of concept, context and semantic relevancy, which are also very essential in terms of clustering process. In majority of the models that are proposed earlier, the factors like the term frequency, were considered and the clustering has been focusing only on one factor of Semantic, whereas as context and conceptual factors also play a significant importance. In extension to the earlier model of MODC, and DC3SR, the proposed model of Multi-objective Distance based Optimal Document Clustering (MODC) by GA has been proposed in the study. Among the lessons that are learnt from the review of earlier models., the scope for fuzzy guided multi-objective optimal document clustering (FMODC) approach which shall support in more effective computation and clustering using the Genetic Algorithm is discussed in the case scenario. From the experimentation process that is focused in the study, using the meta-text data gathered from the same publisher, the model has been tested in comparative analysis to other two models, BADC and AC-DCO and the outcome in terms of optimum clustering that has been achieved with FMODC model depicts the kind of accuracy in the model and the system. An unsupervised learning approach to form the initial clusters that estimates similarity between any two documents by concept, context and semantic relevance score and further optimizes by fuzzy genetic algorithm is proposed. This novel method represents the concept as correlation between arguments and activities in given documents, context as correlation between meta-text of the documents and the semantic relevance is assessed by estimating the similarity between documents through the hyponyms of the arguments. The meta-text of the documents considered for context assessment contains the authors list, keywords list and list of document versioning time schedules. The experiments were conducted to assess the significance of the proposed model.