用于数据聚类的高度相关特征集选择

2014 International Conference on Recent Trends in Information Technology Pub Date : 2014-04-10 DOI:10.1109/ICRTIT.2014.6996215

M. Sumalatha, M. Ananthi, A. Arvind, N. Navin, C. Siddarth

{"title":"用于数据聚类的高度相关特征集选择","authors":"M. Sumalatha, M. Ananthi, A. Arvind, N. Navin, C. Siddarth","doi":"10.1109/ICRTIT.2014.6996215","DOIUrl":null,"url":null,"abstract":"Feature set selection is the process of identifying a subset of features which produces the result same as the entire set. The feature set selection helps in clustering the datasets. In this paper, a Highly Correlated Feature set Selection (HCFS) algorithmis proposed for clustering the data. This algorithm helps in selecting features based on its relevancy and redundancy factors. All the selected features are finally clustered based on how they are correlated with each other. The main objective of this paper is to identify the feature subsets which will improve the classification performance by constructing minimum spanning tree (MST) between the features.The HCFS algorithm works in two steps. In the first step, the features are divided into clusters using the spanning tree construction process. In the second step, the cluster representatives are selected using Frequent Pattern Analysis (FPA) technique to form the effective feature set which reduces the time required for query evaluation process. The redundant and irrelevant features are removed based on their Symmetric Uncertainty (SU) values. This effectively improves the efficiency of data clustering process.","PeriodicalId":422275,"journal":{"name":"2014 International Conference on Recent Trends in Information Technology","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Highly correlated feature set selection for data clustering\",\"authors\":\"M. Sumalatha, M. Ananthi, A. Arvind, N. Navin, C. Siddarth\",\"doi\":\"10.1109/ICRTIT.2014.6996215\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Feature set selection is the process of identifying a subset of features which produces the result same as the entire set. The feature set selection helps in clustering the datasets. In this paper, a Highly Correlated Feature set Selection (HCFS) algorithmis proposed for clustering the data. This algorithm helps in selecting features based on its relevancy and redundancy factors. All the selected features are finally clustered based on how they are correlated with each other. The main objective of this paper is to identify the feature subsets which will improve the classification performance by constructing minimum spanning tree (MST) between the features.The HCFS algorithm works in two steps. In the first step, the features are divided into clusters using the spanning tree construction process. In the second step, the cluster representatives are selected using Frequent Pattern Analysis (FPA) technique to form the effective feature set which reduces the time required for query evaluation process. The redundant and irrelevant features are removed based on their Symmetric Uncertainty (SU) values. This effectively improves the efficiency of data clustering process.\",\"PeriodicalId\":422275,\"journal\":{\"name\":\"2014 International Conference on Recent Trends in Information Technology\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Recent Trends in Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICRTIT.2014.6996215\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Recent Trends in Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRTIT.2014.6996215","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

特征集选择是识别特征子集的过程，该子集产生与整个集合相同的结果。特征集选择有助于对数据集进行聚类。本文提出了一种用于数据聚类的高度相关特征集选择(HCFS)算法。该算法可以根据特征的相关度和冗余度进行特征的选择。最后，根据所选特征之间的相互关系对它们进行聚类。本文的主要目标是通过在特征子集之间构造最小生成树(MST)来识别特征子集，从而提高分类性能。HCFS算法分为两个步骤。在第一步中，使用生成树构建过程将特征划分为簇。第二步，使用频繁模式分析(FPA)技术选择聚类代表，形成有效的特征集，减少查询评估过程所需的时间。根据冗余和不相关特征的对称不确定性(SU)值去除冗余和不相关特征。这有效地提高了数据聚类过程的效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Highly correlated feature set selection for data clustering

Feature set selection is the process of identifying a subset of features which produces the result same as the entire set. The feature set selection helps in clustering the datasets. In this paper, a Highly Correlated Feature set Selection (HCFS) algorithmis proposed for clustering the data. This algorithm helps in selecting features based on its relevancy and redundancy factors. All the selected features are finally clustered based on how they are correlated with each other. The main objective of this paper is to identify the feature subsets which will improve the classification performance by constructing minimum spanning tree (MST) between the features.The HCFS algorithm works in two steps. In the first step, the features are divided into clusters using the spanning tree construction process. In the second step, the cluster representatives are selected using Frequent Pattern Analysis (FPA) technique to form the effective feature set which reduces the time required for query evaluation process. The redundant and irrelevant features are removed based on their Symmetric Uncertainty (SU) values. This effectively improves the efficiency of data clustering process.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 International Conference on Recent Trends in Information Technology

自引率

0.00%

发文量