一种基于可扩展签名的大数据子空间聚类方法

Int. J. Inf. Technol. Web Eng. Pub Date : 2019-04-01 DOI:10.4018/IJITWE.2019040103

T. Gayathri, D. Bhaskari

{"title":"一种基于可扩展签名的大数据子空间聚类方法","authors":"T. Gayathri, D. Bhaskari","doi":"10.4018/IJITWE.2019040103","DOIUrl":null,"url":null,"abstract":"“Big data” as the name suggests is a collection of large and complicated data sets which are usually hard to process with on-hand data management tools or other conventional processing applications. A scalable signature based subspace clustering approach is presented in this article that would avoid identification of redundant clusters. Various distance measures are utilized to perform experiments that validate the performance of the proposed algorithm. Also, for the same purpose of validation, the synthetic data sets that are chosen have different dimensions, and their size will be distributed when opened with Weka. The F1 quality measure and the runtime of these synthetic data sets are computed. The performance of the proposed algorithm is compared with other existing clustering algorithms such as CLIQUE.INSCY and SUNCLU.","PeriodicalId":222340,"journal":{"name":"Int. J. Inf. Technol. Web Eng.","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Novel Scalable Signature Based Subspace Clustering Approach for Big Data\",\"authors\":\"T. Gayathri, D. Bhaskari\",\"doi\":\"10.4018/IJITWE.2019040103\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"“Big data” as the name suggests is a collection of large and complicated data sets which are usually hard to process with on-hand data management tools or other conventional processing applications. A scalable signature based subspace clustering approach is presented in this article that would avoid identification of redundant clusters. Various distance measures are utilized to perform experiments that validate the performance of the proposed algorithm. Also, for the same purpose of validation, the synthetic data sets that are chosen have different dimensions, and their size will be distributed when opened with Weka. The F1 quality measure and the runtime of these synthetic data sets are computed. The performance of the proposed algorithm is compared with other existing clustering algorithms such as CLIQUE.INSCY and SUNCLU.\",\"PeriodicalId\":222340,\"journal\":{\"name\":\"Int. J. Inf. Technol. Web Eng.\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Inf. Technol. Web Eng.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4018/IJITWE.2019040103\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Inf. Technol. Web Eng.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/IJITWE.2019040103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

“大数据”顾名思义是一组庞大而复杂的数据集，通常很难用现有的数据管理工具或其他传统的处理应用程序来处理。本文提出了一种基于可扩展签名的子空间聚类方法，该方法可以避免识别冗余聚类。利用各种距离测量来进行实验，以验证所提出算法的性能。同样，为了验证的目的，所选择的合成数据集具有不同的维度，并且在使用Weka打开时它们的大小将被分布。计算了这些合成数据集的F1质量度量和运行时间。将该算法与CLIQUE等现有聚类算法的性能进行了比较。incy和SUNCLU。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Novel Scalable Signature Based Subspace Clustering Approach for Big Data

“Big data” as the name suggests is a collection of large and complicated data sets which are usually hard to process with on-hand data management tools or other conventional processing applications. A scalable signature based subspace clustering approach is presented in this article that would avoid identification of redundant clusters. Various distance measures are utilized to perform experiments that validate the performance of the proposed algorithm. Also, for the same purpose of validation, the synthetic data sets that are chosen have different dimensions, and their size will be distributed when opened with Weka. The F1 quality measure and the runtime of these synthetic data sets are computed. The performance of the proposed algorithm is compared with other existing clustering algorithms such as CLIQUE.INSCY and SUNCLU.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Int. J. Inf. Technol. Web Eng.

自引率

0.00%

发文量