Co-clustering of Lagged Data

Eran Shaham, David Sarne, Boaz Ben-Moshe
{"title":"Co-clustering of Lagged Data","authors":"Eran Shaham, David Sarne, Boaz Ben-Moshe","doi":"10.1109/ICDM.2010.44","DOIUrl":null,"url":null,"abstract":"The paper focuses on mining clusters that are characterized by a lagged relationship between the data objects. We call such clusters lagged co-clusters. A lagged co-cluster of a matrix is a sub matrix determined by a subset of rows and their corresponding lag over a subset of columns. Extracting such subsets (not necessarily successive) may reveal an underlying governing regulatory mechanism. Such a regulatory mechanism is quite common in real life settings. It appears in a variety of fields: meteorology, seismic activity, stock market behavior, neuronal brain activity, river flow and navigation, are but a limited list of examples. Mining such lagged co-clusters not only helps in understanding the relationship between objects in the domain, but assists in forecasting their future behavior. For most interesting variants of this problem, finding an optimal lagged co-cluster is an NP-complete problem. We present a polynomial-time Monte-Carlo algorithm for finding a set of lagged co-clusters whose error does not exceed a pre-specified value, which handles noise, anti-correlations, missing values, and overlapping patterns. Moreover, we prove that the list includes, with fixed probability, a lagged co-cluster which is optimal in its dimensions. The algorithm was extensively evaluated using various environments. First, artificial data, enabling the evaluation of specific, isolated properties of the algorithm. Secondly, real-world data, using river flow and topographic data, enabling the evaluation of the algorithm to efficiently mine relevant and coherent lagged co-clusters in environments that are temporal, i.e., time reading data, and non-temporal, respectively.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2010.44","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The paper focuses on mining clusters that are characterized by a lagged relationship between the data objects. We call such clusters lagged co-clusters. A lagged co-cluster of a matrix is a sub matrix determined by a subset of rows and their corresponding lag over a subset of columns. Extracting such subsets (not necessarily successive) may reveal an underlying governing regulatory mechanism. Such a regulatory mechanism is quite common in real life settings. It appears in a variety of fields: meteorology, seismic activity, stock market behavior, neuronal brain activity, river flow and navigation, are but a limited list of examples. Mining such lagged co-clusters not only helps in understanding the relationship between objects in the domain, but assists in forecasting their future behavior. For most interesting variants of this problem, finding an optimal lagged co-cluster is an NP-complete problem. We present a polynomial-time Monte-Carlo algorithm for finding a set of lagged co-clusters whose error does not exceed a pre-specified value, which handles noise, anti-correlations, missing values, and overlapping patterns. Moreover, we prove that the list includes, with fixed probability, a lagged co-cluster which is optimal in its dimensions. The algorithm was extensively evaluated using various environments. First, artificial data, enabling the evaluation of specific, isolated properties of the algorithm. Secondly, real-world data, using river flow and topographic data, enabling the evaluation of the algorithm to efficiently mine relevant and coherent lagged co-clusters in environments that are temporal, i.e., time reading data, and non-temporal, respectively.
滞后数据的共聚类
本文的重点是挖掘以数据对象之间的滞后关系为特征的集群。我们称这种集群为滞后共集群。矩阵的滞后共簇是由行子集及其对应的列子集上的滞后决定的子矩阵。提取这样的子集(不一定是连续的)可能揭示一个潜在的管理调节机制。这样的监管机制在现实生活中很常见。它出现在许多领域:气象学、地震活动、股票市场行为、神经大脑活动、河流流动和导航,这些都是有限的例子。挖掘这种滞后的共簇不仅有助于理解领域中对象之间的关系,而且有助于预测它们未来的行为。对于这个问题的大多数有趣的变体,找到最优滞后共簇是一个np完全问题。我们提出了一种多项式时间蒙特卡罗算法,用于寻找一组误差不超过预设值的滞后共聚类,该算法可以处理噪声、反相关性、缺失值和重叠模式。此外,我们还以固定的概率证明了该列表包含一个维度最优的滞后共簇。该算法在各种环境下进行了广泛的评估。首先,人工数据,使评估具体的,孤立的性质的算法。其次,现实世界数据,使用河流流量和地形数据,使算法的评估能够有效地挖掘相关和相干滞后共簇,分别在时间环境中,即时间读取数据和非时间环境中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信