{"title":"Communication-Efficient Distributed Multiple Reference Pattern Matching for M2M Systems","authors":"Jui-Pin Wang, Yu-Chen Lu, Mi-Yen Yeh, Shou-de Lin, Phillip B. Gibbons","doi":"10.1109/ICDM.2013.161","DOIUrl":"https://doi.org/10.1109/ICDM.2013.161","url":null,"abstract":"In M2M applications, it is very common to encounter the ad hoc snapshot query that requires fast responses from many local machines in which all the data are distributed. In the scenario when the query is more complex, the communication cost for sending it to all the local machines for processing can be very high. This paper aims to address this issue. Given a reference set of multiple and large-size patterns, we propose an approach to identifying its k nearest and farthest neighbors globally across all the local machines. By decomposing the reference patterns into a multi-resolution representation and using novel distance bound designs, our method guarantees the exact results in a communication-efficient manner. Analytical and empirical studies show that our method outperforms the state-of-the-art methods in saving significant bandwidth usage, especially for large numbers of machines and large-sized reference patterns.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134112850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
James H. Faghmous, M. Le, Muhammed Uluyol, Vipin Kumar, Snigdhansu Chatterjee
{"title":"A Parameter-Free Spatio-Temporal Pattern Mining Model to Catalog Global Ocean Dynamics","authors":"James H. Faghmous, M. Le, Muhammed Uluyol, Vipin Kumar, Snigdhansu Chatterjee","doi":"10.1109/ICDM.2013.162","DOIUrl":"https://doi.org/10.1109/ICDM.2013.162","url":null,"abstract":"As spatio-temporal data have become ubiquitous, an increasing challenge facing computer scientists is that of identifying discrete patterns in continuous spatio-temporal fields. In this paper, we introduce a parameter-free pattern mining application that is able to identify dynamic anomalies in ocean data, known as ocean eddies. Despite ocean eddy monitoring being an active field of research, we provide one of the first quantitative analyses of the performance of the most used monitoring algorithms. We present an incomplete information validation technique, that uses the performance of two methods to construct an imperfect ground truth to test the significance of patterns discovered as well as the relative performance of pattern mining algorithms. These methods, in addition to the validation schemes discussed provide researchers new directions in analyzing large unlabeled climate datasets.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133196726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Approach to Updating Closeness Centrality and Average Path Length in Dynamic Networks","authors":"Chia-Chen Yen, Mi-Yen Yeh, Ming-Syan Chen","doi":"10.1109/ICDM.2013.135","DOIUrl":"https://doi.org/10.1109/ICDM.2013.135","url":null,"abstract":"Closeness centrality measures the communication efficiency of a specific vertex within a network while the average path length (APL) measures that of the whole network. Since the nature of these two measurements is based on the computation of all-pair shortest path distances, one can perform the breadth-first search method starting at every vertex and obtain the two measurements. However, as the edge counts in the real-world networks like Facebook increase over time, this naive way is obviously inefficient. In this paper, we proposed CENDY, an efficient approach to updating Closeness centrality and average path length in Dynamic networks when there is an edge insertion or deletion. In CENDY, we derived some theoretical properties to quickly identify a set of vertices whose shortest path changed after an edge update, and then update the closeness centralities of those vertices only as well as the APL of the graph by a few of single-source shortest path computations. We conducted extensive experiments to show that, when compared to the existing methods of computing exact or approximate values, CENDY outperformed others in significantly low update time while providing exact values of the two measurements on various real-world graph datasets.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129498814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Beyond Boolean Matrix Decompositions: Toward Factor Analysis and Dimensionality Reduction of Ordinal Data","authors":"R. Belohlávek, Markéta Krmelová","doi":"10.1109/ICDM.2013.127","DOIUrl":"https://doi.org/10.1109/ICDM.2013.127","url":null,"abstract":"Boolean matrix factorization (BMF), or decomposition, received a considerable attention in data mining research. In this paper, we argue that research should extend beyond the Boolean case toward more general type of data such as ordinal data. Technically, such extension amounts to replacement of the two-element Boolean algebra utilized in BMF by more general structures, which brings non-trivial challenges. We first present the problem formulation, survey the existing literature, and provide an illustrative example. Second, we present new theorems regarding decompositions of matrices with ordinal data. Third, we propose a new algorithm based on these results along with an experimental evaluation.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131599509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingu Kim, Naren Ramakrishnan, M. Marwah, Amip Shah, Haesun Park
{"title":"Regularization Paths for Sparse Nonnegative Least Squares Problems with Applications to Life Cycle Assessment Tree Discovery","authors":"Jingu Kim, Naren Ramakrishnan, M. Marwah, Amip Shah, Haesun Park","doi":"10.1109/ICDM.2013.125","DOIUrl":"https://doi.org/10.1109/ICDM.2013.125","url":null,"abstract":"The nonnegative least squares problems are useful in applications where the physical nature of problem domain permits only additive linear combinations. We discuss the l1-regularized nonnegative least squares (L1-NLS) problem, where l1-regularization is used to induce sparsity. Although l1-regularization has been successfully used in least squares regression, when combined with nonnegativity constraints, developments of algorithms and their understandings have been limited. We propose an algorithm that generates the entire regularization paths of the L1-NLS problem. We prove the correctness of the proposed algorithm and illustrate a novel application in environmental sustainability. The application relates to life cycle assessment (LCA), a technique used to estimate environmental impact during the entire lifetime of a product. We address an inverse problem in LCA. Given environmental impact factors of a target product and of a large library of constituents, the goal is to reverse engineer an inventory tree for the product. Using real-world data sets, we demonstrate how our L1-NLS approach controls the size of discovered trees, and how the full regularization paths effectively illustrate the spectrum of discovered trees with varying sparsity and compositions.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115175754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structural-Context Similarities for Uncertain Graphs","authors":"Zhaonian Zou, Jianzhong Li","doi":"10.1109/ICDM.2013.22","DOIUrl":"https://doi.org/10.1109/ICDM.2013.22","url":null,"abstract":"Structural-context similarities between vertices in graphs, such as the Jaccard similarity, the Dice similarity, and the cosine similarity, play important roles in a number of graph data analysis techniques. However, uncertainty is inherent in massive graph data, and therefore the classical definitions of structural-context similarities on exact graphs don't make sense on uncertain graphs. In this paper, we propose a generic definition of structural-context similarity for uncertain graphs. Since it is computationally prohibitive to compute the similarity between two vertices of an uncertain graph directly by its definition, we investigate two efficient approaches to computing similarities, namely the polynomial-time exact algorithms and the linear-time approximation algorithms. The experimental results on real uncertain graphs verify the effectiveness of the proposed structural-context similarities as well as the accuracy and efficiency of the proposed evaluation algorithms.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123851084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xueqing Gong, Xinyu Guo, Rong Zhang, Xiaofeng He, Aoying Zhou
{"title":"Search Behavior Based Latent Semantic User Segmentation for Advertising Targeting","authors":"Xueqing Gong, Xinyu Guo, Rong Zhang, Xiaofeng He, Aoying Zhou","doi":"10.1109/ICDM.2013.62","DOIUrl":"https://doi.org/10.1109/ICDM.2013.62","url":null,"abstract":"The popularity of internet usage greatly motivates the online advertising activities. Compared to advertising on traditional media, online advertising has rich information as well as necessary techniques to achieve precise user targeting. This rich information includes the search behaviors of a user, such as queries issued, or the ads clicked by the user. For popular websites with large number of active users, ad delivery targeting at individual users puts too much burden on the system. User segmentation is an alternative way to relieve this burden by grouping users of similar interests together, then the ad delivery system targets the user segments to display relevant ads, instead of individual users. Existing user segmentation work either adapts clustering methods without considering the hidden semantics embedded in the data, such as K-means, or treats users as data instance and clusters users indirectly even if the latent semantics is incorporated into the transformed data, such as PLSA or LDA. In this paper, we present a search behavior based latent semantic user segmentation method and validate its effectiveness on new ads. Instead of treating users as data instances, they are used as attributes of user issued queries or clicked ads which are considered to be data instances. LDA is then applied to this data set to directly obtain the user segments. Compared to popular K-means clustering, our approach achieves higher CTR values on new ads, with only simple search information.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123818840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Anomalous Hotspot Discovery in Graph Streams","authors":"Weiren Yu, C. Aggarwal, Shuai Ma, Haixun Wang","doi":"10.1109/ICDM.2013.32","DOIUrl":"https://doi.org/10.1109/ICDM.2013.32","url":null,"abstract":"Network streams have become ubiquitous in recent years because of many dynamic applications. Such streams may show localized regions of activity and evolution because of anomalous events. This paper will present methods for dynamically determining anomalous hot spots from network streams. These are localized regions of sudden activity or change in the underlying network. We will design a localized principal component analysis algorithm, which can continuously maintain the information about the changes in the different neighborhoods of the network. We will use a fast incremental eigenvector update algorithm based on von Mises iterations in a lazy way in order to efficiently maintain local correlation information. This is used to discover local change hotspots in dynamic streams. We will finally present an experimental study to demonstrate the effectiveness and efficiency of our approach.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123159671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling Temporal Adoptions Using Dynamic Matrix Factorization","authors":"Freddy Chongtat Chua, R. J. Oentaryo, Ee-Peng Lim","doi":"10.1109/ICDM.2013.25","DOIUrl":"https://doi.org/10.1109/ICDM.2013.25","url":null,"abstract":"The problem of recommending items to users is relevant to many applications and the problem has often been solved using methods developed from Collaborative Filtering (CF). Collaborative Filtering model-based methods such as Matrix Factorization have been shown to produce good results for static rating-type data, but have not been applied to time-stamped item adoption data. In this paper, we adopted a Dynamic Matrix Factorization (DMF) technique to derive different temporal factorization models that can predict missing adoptions at different time steps in the users' adoption history. This DMF technique is an extension of the Non-negative Matrix Factorization (NMF) based on the well-known class of models called Linear Dynamical Systems (LDS). By evaluating our proposed models against NMF and TimeSVD++ on two real datasets extracted from ACM Digital Library and DBLP, we show empirically that DMF can predict adoptions more accurately than the NMF for several prediction tasks as well as outperforming TimeSVD++ in some of the prediction tasks. We further illustrate the ability of DMF to discover evolving research interests for a few author examples.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128642091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junfu Yin, Z. Zheng, Longbing Cao, Yin Song, Wei Wei
{"title":"Efficiently Mining Top-K High Utility Sequential Patterns","authors":"Junfu Yin, Z. Zheng, Longbing Cao, Yin Song, Wei Wei","doi":"10.1109/ICDM.2013.148","DOIUrl":"https://doi.org/10.1109/ICDM.2013.148","url":null,"abstract":"High utility sequential pattern mining is an emerging topic in the data mining community. Compared to the classic frequent sequence mining, the utility framework provides more informative and actionable knowledge since the utility of a sequence indicates business value and impact. However, the introduction of \"utility\" makes the problem fundamentally different from the frequency-based pattern mining framework and brings about dramatic challenges. Although the existing high utility sequential pattern mining algorithms can discover all the patterns satisfying a given minimum utility, it is often difficult for users to set a proper minimum utility. A too small value may produce thousands of patterns, whereas a too big one may lead to no findings. In this paper, we propose a novel framework called top-k high utility sequential pattern mining to tackle this critical problem. Accordingly, an efficient algorithm, Top-k high Utility Sequence (TUS for short) mining, is designed to identify top-k high utility sequential patterns without minimum utility. In addition, three effective features are introduced to handle the efficiency problem, including two strategies for raising the threshold and one pruning for filtering unpromising items. Our experiments are conducted on both synthetic and real datasets. The results show that TUS incorporating the efficiency-enhanced strategies demonstrates impressive performance without missing any high utility sequential patterns.","PeriodicalId":308676,"journal":{"name":"2013 IEEE 13th International Conference on Data Mining","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128400731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}