An Efficient Semantic based Clustering Algorithm for Textual Documents

R. Karthika, L. JegathaDeborah
{"title":"An Efficient Semantic based Clustering Algorithm for Textual Documents","authors":"R. Karthika, L. JegathaDeborah","doi":"10.1109/ICCSDET.2018.8821148","DOIUrl":null,"url":null,"abstract":"Documents that are classified into different categories gets flooded in the internet every day. These documents have many links or associations with the other documents in the web. The terms in the document are open to multiple interpretations which are vague and unclear. Hence there is a need to find the semantic understanding of the terms. One of the major application in identifying and applying such semantic measure lies in clustering the related textual documents. However, the traditional clustering algorithms may exhibit reduced performances due to the existence of irrelevant terms in the raw documents. Hence, the proposed algorithm in this paper exploits the use of a feature selection algorithm in order to increase the performance of the clustering algorithm. In this paper, a feature selection algorithm with booster technique is used. Moreover, clustering algorithm based on a fuzzy linguistic variable measure that uses separation and dominance value is used in this paper for precise clustering. Experimental analysis shows that the three performance measures that evaluates the clustering algorithm increases, in comparison to the other existing algorithms.","PeriodicalId":157362,"journal":{"name":"2018 International Conference on Circuits and Systems in Digital Enterprise Technology (ICCSDET)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Circuits and Systems in Digital Enterprise Technology (ICCSDET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSDET.2018.8821148","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Documents that are classified into different categories gets flooded in the internet every day. These documents have many links or associations with the other documents in the web. The terms in the document are open to multiple interpretations which are vague and unclear. Hence there is a need to find the semantic understanding of the terms. One of the major application in identifying and applying such semantic measure lies in clustering the related textual documents. However, the traditional clustering algorithms may exhibit reduced performances due to the existence of irrelevant terms in the raw documents. Hence, the proposed algorithm in this paper exploits the use of a feature selection algorithm in order to increase the performance of the clustering algorithm. In this paper, a feature selection algorithm with booster technique is used. Moreover, clustering algorithm based on a fuzzy linguistic variable measure that uses separation and dominance value is used in this paper for precise clustering. Experimental analysis shows that the three performance measures that evaluates the clustering algorithm increases, in comparison to the other existing algorithms.
一种基于语义的高效文本文档聚类算法
被分类为不同类别的文件每天都在互联网上泛滥。这些文档与网络上的其他文档有许多链接或关联。文件中的条款可以有多种解释,这些解释含糊不清。因此,有必要找到术语的语义理解。识别和应用这种语义度量的一个主要应用是对相关的文本文档进行聚类。然而,由于原始文档中存在不相关的术语,传统的聚类算法可能会表现出性能下降。因此,本文提出的算法利用特征选择算法来提高聚类算法的性能。本文采用了一种基于升压技术的特征选择算法。此外,本文还采用基于分离和优势值的模糊语言变量测度的聚类算法进行精确聚类。实验分析表明,与其他现有算法相比,评价聚类算法的三个性能指标都有所提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信