A krill herd algorithm for efficient text documents clustering

Laith Mohammad Abualigah, A. Khader, M. Al-Betar, M. Awadallah
{"title":"A krill herd algorithm for efficient text documents clustering","authors":"Laith Mohammad Abualigah, A. Khader, M. Al-Betar, M. Awadallah","doi":"10.1109/ISCAIE.2016.7575039","DOIUrl":null,"url":null,"abstract":"Recently, due to the huge growth of web pages, social media and modern applications, text clustering technique has emerged as a significant task to deal with a huge amount of text documents. Some web pages are easily browsed and tidily presented via applying the clustering technique in order to partition the documents into a subset of homogeneous clusters. In this paper, two novel text clustering algorithms based on krill herd (KH) algorithm are proposed to improve the web text documents clustering. In the first method, the basic KH algorithm with all its operators is utilized while in the second method, the genetic operators in the basic KH algorithm are neglected. The performance of the proposed KH algorithms is analyzed and compared with the k-mean algorithm. The experiments were conducted using four standard benchmark text datasets. The results showed that the proposed KH algorithms outperformed the k-mean algorithm in term of clusters quality that is evaluated using two common clustering measures, namely, Purity and Entropy.","PeriodicalId":412517,"journal":{"name":"2016 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE)","volume":"153 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"54","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCAIE.2016.7575039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 54

Abstract

Recently, due to the huge growth of web pages, social media and modern applications, text clustering technique has emerged as a significant task to deal with a huge amount of text documents. Some web pages are easily browsed and tidily presented via applying the clustering technique in order to partition the documents into a subset of homogeneous clusters. In this paper, two novel text clustering algorithms based on krill herd (KH) algorithm are proposed to improve the web text documents clustering. In the first method, the basic KH algorithm with all its operators is utilized while in the second method, the genetic operators in the basic KH algorithm are neglected. The performance of the proposed KH algorithms is analyzed and compared with the k-mean algorithm. The experiments were conducted using four standard benchmark text datasets. The results showed that the proposed KH algorithms outperformed the k-mean algorithm in term of clusters quality that is evaluated using two common clustering measures, namely, Purity and Entropy.
一种高效文本文档聚类的磷虾群算法
近年来,由于网页、社交媒体和现代应用的巨大增长,文本聚类技术已经成为处理大量文本文档的重要任务。通过应用聚类技术,将文档划分为同质聚类的子集,可以方便地浏览和整齐地呈现一些网页。本文提出了两种基于krill herd (KH)算法的文本聚类算法,以提高网络文本文档的聚类性能。第一种方法利用了基本KH算法及其所有算子,第二种方法忽略了基本KH算法中的遗传算子。对所提KH算法的性能进行了分析,并与k-均值算法进行了比较。实验使用四个标准基准文本数据集进行。结果表明,KH算法在聚类质量方面优于k-mean算法,聚类质量是用两种常见的聚类度量,即纯度和熵来评估的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信