A krill herd algorithm for efficient text documents clustering

2016 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE) Pub Date : 2016-05-30 DOI:10.1109/ISCAIE.2016.7575039

Laith Mohammad Abualigah, A. Khader, M. Al-Betar, M. Awadallah

{"title":"A krill herd algorithm for efficient text documents clustering","authors":"Laith Mohammad Abualigah, A. Khader, M. Al-Betar, M. Awadallah","doi":"10.1109/ISCAIE.2016.7575039","DOIUrl":null,"url":null,"abstract":"Recently, due to the huge growth of web pages, social media and modern applications, text clustering technique has emerged as a significant task to deal with a huge amount of text documents. Some web pages are easily browsed and tidily presented via applying the clustering technique in order to partition the documents into a subset of homogeneous clusters. In this paper, two novel text clustering algorithms based on krill herd (KH) algorithm are proposed to improve the web text documents clustering. In the first method, the basic KH algorithm with all its operators is utilized while in the second method, the genetic operators in the basic KH algorithm are neglected. The performance of the proposed KH algorithms is analyzed and compared with the k-mean algorithm. The experiments were conducted using four standard benchmark text datasets. The results showed that the proposed KH algorithms outperformed the k-mean algorithm in term of clusters quality that is evaluated using two common clustering measures, namely, Purity and Entropy.","PeriodicalId":412517,"journal":{"name":"2016 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE)","volume":"153 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"54","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCAIE.2016.7575039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 54

Abstract

Recently, due to the huge growth of web pages, social media and modern applications, text clustering technique has emerged as a significant task to deal with a huge amount of text documents. Some web pages are easily browsed and tidily presented via applying the clustering technique in order to partition the documents into a subset of homogeneous clusters. In this paper, two novel text clustering algorithms based on krill herd (KH) algorithm are proposed to improve the web text documents clustering. In the first method, the basic KH algorithm with all its operators is utilized while in the second method, the genetic operators in the basic KH algorithm are neglected. The performance of the proposed KH algorithms is analyzed and compared with the k-mean algorithm. The experiments were conducted using four standard benchmark text datasets. The results showed that the proposed KH algorithms outperformed the k-mean algorithm in term of clusters quality that is evaluated using two common clustering measures, namely, Purity and Entropy.

查看原文本刊更多论文

一种高效文本文档聚类的磷虾群算法

近年来，由于网页、社交媒体和现代应用的巨大增长，文本聚类技术已经成为处理大量文本文档的重要任务。通过应用聚类技术，将文档划分为同质聚类的子集，可以方便地浏览和整齐地呈现一些网页。本文提出了两种基于krill herd (KH)算法的文本聚类算法，以提高网络文本文档的聚类性能。第一种方法利用了基本KH算法及其所有算子，第二种方法忽略了基本KH算法中的遗传算子。对所提KH算法的性能进行了分析，并与k-均值算法进行了比较。实验使用四个标准基准文本数据集进行。结果表明，KH算法在聚类质量方面优于k-mean算法，聚类质量是用两种常见的聚类度量，即纯度和熵来评估的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE)

自引率

0.00%

发文量