CARE: Finding Local Linear Correlations in High Dimensional Data.

Xiang Zhang, Feng Pan, Wei Wang
{"title":"CARE: Finding Local Linear Correlations in High Dimensional Data.","authors":"Xiang Zhang,&nbsp;Feng Pan,&nbsp;Wei Wang","doi":"10.1109/ICDE.2008.4497421","DOIUrl":null,"url":null,"abstract":"<p><p>Finding latent patterns in high dimensional data is an important research problem with numerous applications. Existing approaches can be summarized into 3 categories: feature selection, feature transformation (or feature projection) and projected clustering. Being widely used in many applications, these methods aim to capture global patterns and are typically performed in the full feature space. In many emerging biomedical applications, however, scientists are interested in the local latent patterns held by feature subsets, which may be invisible via any global transformation. In this paper, we investigate the problem of finding local linear correlations in high dimensional data. Our goal is to find the latent pattern structures that may exist only in some subspaces. We formalize this problem as finding strongly correlated feature subsets which are supported by a large portion of the data points. Due to the combinatorial nature of the problem and lack of monotonicity of the correlation measurement, it is prohibitively expensive to exhaustively explore the whole search space. In our algorithm, CARE, we utilize spectrum properties and effective heuristic to prune the search space. Extensive experimental results show that our approach is effective in finding local linear correlations that may not be identified by existing methods.</p>","PeriodicalId":74570,"journal":{"name":"Proceedings. International Conference on Data Engineering","volume":"24 ","pages":"130-139"},"PeriodicalIF":0.0000,"publicationDate":"2008-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/ICDE.2008.4497421","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2008.4497421","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24

Abstract

Finding latent patterns in high dimensional data is an important research problem with numerous applications. Existing approaches can be summarized into 3 categories: feature selection, feature transformation (or feature projection) and projected clustering. Being widely used in many applications, these methods aim to capture global patterns and are typically performed in the full feature space. In many emerging biomedical applications, however, scientists are interested in the local latent patterns held by feature subsets, which may be invisible via any global transformation. In this paper, we investigate the problem of finding local linear correlations in high dimensional data. Our goal is to find the latent pattern structures that may exist only in some subspaces. We formalize this problem as finding strongly correlated feature subsets which are supported by a large portion of the data points. Due to the combinatorial nature of the problem and lack of monotonicity of the correlation measurement, it is prohibitively expensive to exhaustively explore the whole search space. In our algorithm, CARE, we utilize spectrum properties and effective heuristic to prune the search space. Extensive experimental results show that our approach is effective in finding local linear correlations that may not be identified by existing methods.

在高维数据中寻找局部线性相关性。
在高维数据中发现潜在模式是一个重要的研究问题,有着广泛的应用。现有的方法可以归纳为三大类:特征选择、特征变换(或特征投影)和投影聚类。这些方法在许多应用程序中广泛使用,旨在捕获全局模式,并且通常在完整的特征空间中执行。然而,在许多新兴的生物医学应用中,科学家们对特征子集持有的局部潜在模式感兴趣,这些模式可能通过任何全局转换都是不可见的。在本文中,我们研究了高维数据中寻找局部线性相关的问题。我们的目标是找到可能只存在于某些子空间中的潜在模式结构。我们将这个问题形式化为找到由大部分数据点支持的强相关特征子集。由于问题的组合性质和相关性度量的缺乏单调性,耗尽地探索整个搜索空间的代价非常高。在我们的算法CARE中,我们利用频谱特性和有效的启发式来修剪搜索空间。大量的实验结果表明,我们的方法可以有效地找到现有方法无法识别的局部线性相关性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.10
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信