用于数据聚类的高度相关特征集选择

M. Sumalatha, M. Ananthi, A. Arvind, N. Navin, C. Siddarth
{"title":"用于数据聚类的高度相关特征集选择","authors":"M. Sumalatha, M. Ananthi, A. Arvind, N. Navin, C. Siddarth","doi":"10.1109/ICRTIT.2014.6996215","DOIUrl":null,"url":null,"abstract":"Feature set selection is the process of identifying a subset of features which produces the result same as the entire set. The feature set selection helps in clustering the datasets. In this paper, a Highly Correlated Feature set Selection (HCFS) algorithmis proposed for clustering the data. This algorithm helps in selecting features based on its relevancy and redundancy factors. All the selected features are finally clustered based on how they are correlated with each other. The main objective of this paper is to identify the feature subsets which will improve the classification performance by constructing minimum spanning tree (MST) between the features.The HCFS algorithm works in two steps. In the first step, the features are divided into clusters using the spanning tree construction process. In the second step, the cluster representatives are selected using Frequent Pattern Analysis (FPA) technique to form the effective feature set which reduces the time required for query evaluation process. The redundant and irrelevant features are removed based on their Symmetric Uncertainty (SU) values. This effectively improves the efficiency of data clustering process.","PeriodicalId":422275,"journal":{"name":"2014 International Conference on Recent Trends in Information Technology","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Highly correlated feature set selection for data clustering\",\"authors\":\"M. Sumalatha, M. Ananthi, A. Arvind, N. Navin, C. Siddarth\",\"doi\":\"10.1109/ICRTIT.2014.6996215\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Feature set selection is the process of identifying a subset of features which produces the result same as the entire set. The feature set selection helps in clustering the datasets. In this paper, a Highly Correlated Feature set Selection (HCFS) algorithmis proposed for clustering the data. This algorithm helps in selecting features based on its relevancy and redundancy factors. All the selected features are finally clustered based on how they are correlated with each other. The main objective of this paper is to identify the feature subsets which will improve the classification performance by constructing minimum spanning tree (MST) between the features.The HCFS algorithm works in two steps. In the first step, the features are divided into clusters using the spanning tree construction process. In the second step, the cluster representatives are selected using Frequent Pattern Analysis (FPA) technique to form the effective feature set which reduces the time required for query evaluation process. The redundant and irrelevant features are removed based on their Symmetric Uncertainty (SU) values. This effectively improves the efficiency of data clustering process.\",\"PeriodicalId\":422275,\"journal\":{\"name\":\"2014 International Conference on Recent Trends in Information Technology\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Recent Trends in Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICRTIT.2014.6996215\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Recent Trends in Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRTIT.2014.6996215","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

特征集选择是识别特征子集的过程,该子集产生与整个集合相同的结果。特征集选择有助于对数据集进行聚类。本文提出了一种用于数据聚类的高度相关特征集选择(HCFS)算法。该算法可以根据特征的相关度和冗余度进行特征的选择。最后,根据所选特征之间的相互关系对它们进行聚类。本文的主要目标是通过在特征子集之间构造最小生成树(MST)来识别特征子集,从而提高分类性能。HCFS算法分为两个步骤。在第一步中,使用生成树构建过程将特征划分为簇。第二步,使用频繁模式分析(FPA)技术选择聚类代表,形成有效的特征集,减少查询评估过程所需的时间。根据冗余和不相关特征的对称不确定性(SU)值去除冗余和不相关特征。这有效地提高了数据聚类过程的效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Highly correlated feature set selection for data clustering
Feature set selection is the process of identifying a subset of features which produces the result same as the entire set. The feature set selection helps in clustering the datasets. In this paper, a Highly Correlated Feature set Selection (HCFS) algorithmis proposed for clustering the data. This algorithm helps in selecting features based on its relevancy and redundancy factors. All the selected features are finally clustered based on how they are correlated with each other. The main objective of this paper is to identify the feature subsets which will improve the classification performance by constructing minimum spanning tree (MST) between the features.The HCFS algorithm works in two steps. In the first step, the features are divided into clusters using the spanning tree construction process. In the second step, the cluster representatives are selected using Frequent Pattern Analysis (FPA) technique to form the effective feature set which reduces the time required for query evaluation process. The redundant and irrelevant features are removed based on their Symmetric Uncertainty (SU) values. This effectively improves the efficiency of data clustering process.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信