{"title":"Boolean Tensor Factorizations","authors":"Pauli Miettinen","doi":"10.1109/ICDM.2011.28","DOIUrl":"https://doi.org/10.1109/ICDM.2011.28","url":null,"abstract":"Tensors are multi-way generalizations of matrices, and similarly to matrices, they can also be factorized, that is, represented (approximately) as a product of factors. These factors are typically either all matrices or a mixture of matrices and tensors. With the widespread adoption of matrix factorization techniques in data mining, also tensor factorizations have started to gain attention. In this paper we study the Boolean tensor factorizations. We assume that the data is binary multi-way data, and we want to factorize it to binary factors using Boolean arithmetic (i.e. defining that 1+1=1). Boolean tensor factorizations are, therefore, natural generalization of the Boolean matrix factorizations. We will study the theory of Boolean tensor factorizations and show that at least some of the benefits Boolean matrix factorizations have over normal matrix factorizations carry over to the tensor data. We will also present algorithms for Boolean variations of CP and Tucker decompositions, the two most-common types of tensor factorizations. With experimentation done with synthetic and real-world data, we show that Boolean tensor factorizations are a viable alternative when the data is naturally binary.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"741 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131885458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Optimal Discriminating Order for Multiclass Classification","authors":"Dong Liu, Shuicheng Yan, Yadong Mu, Xiansheng Hua, Shih-Fu Chang, HongJiang Zhang","doi":"10.1109/ICDM.2011.147","DOIUrl":"https://doi.org/10.1109/ICDM.2011.147","url":null,"abstract":"In this paper, we investigate how to design an optimized discriminating order for boosting multiclass classification. The main idea is to optimize a binary tree architecture, referred to as Sequential Discriminating Tree (SDT), that performs the multiclass classification through a hierarchical sequence of coarse-to-fine binary classifiers. To infer such a tree architecture, we employ the constrained large margin clustering procedure which enforces samples belonging to the same class to locate at the same side of the hyper plane while maximizing the margin between these two partitioned class subsets. The proposed SDT algorithm has a theoretic error bound which is shown experimentally to effectively guarantee the generalization performance. Experiment results indicate that SDT clearly beats the state-of-the-art multiclass classification algorithms.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133818802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuan Sun, H. Kashima, Ryota Tomioka, N. Ueda, Ping Li
{"title":"A New Multi-task Learning Method for Personalized Activity Recognition","authors":"Xuan Sun, H. Kashima, Ryota Tomioka, N. Ueda, Ping Li","doi":"10.1109/ICDM.2011.14","DOIUrl":"https://doi.org/10.1109/ICDM.2011.14","url":null,"abstract":"Personalized activity recognition usually faces the problem of data sparseness. We aim at improving accuracy of personalized activity recognition by incorporating the information from other persons. We propose a new online multi-task learning method for personalized activity recognition. The proposed online multi-task learning method automatically learns the ``transfer-factors\" (similarities) among different tasks (i.e., among different persons in our case). Experiments demonstrate that the proposed method significantly outperforms existing methods. The novelty of this paper is twofold: (1) A new multi-task learning framework, which can naturally learn similarities among tasks, (2) To our knowledge, this is the first study of large-scale personalized activity recognition.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"11 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132870994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining Feature Context and Spatial Context for Image Pattern Discovery","authors":"Hongxing Wang, Junsong Yuan, Yap-Peng Tan","doi":"10.1109/ICDM.2011.38","DOIUrl":"https://doi.org/10.1109/ICDM.2011.38","url":null,"abstract":"Once an image is decomposed into a number of visual primitives, e.g., local interest points or salient image regions, it is of great interests to discover meaningful visual patterns from them. Conventional clustering (e.g., k-means) of visual primitives, however, usually ignores the spatial dependency among them, thus cannot discover the high-level visual patterns of complex spatial structure. To overcome this problem, we propose to consider both spatial and feature contexts among visual primitives for pattern discovery. By discovering both spatial co-occurrence patterns among visual primitives and feature co-occurrence patterns among different types of features, our method can better handle the ambiguities of visual primitives, by leveraging these co-occurrences. We formulate the problem as a regularized k-means clustering, and propose an iterative bottom-up/top-down self-learning procedure to gradually refine the result until it converges. The experiments of image text on discovery and image region clustering convince that combining spatial and feature contexts can significantly improve the pattern discovery results.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133030588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recursive Multi-step Time Series Forecasting by Perturbing Data","authors":"S. B. Taieb, Gianluca Bontempi","doi":"10.1109/ICDM.2011.123","DOIUrl":"https://doi.org/10.1109/ICDM.2011.123","url":null,"abstract":"The Recursive strategy is the oldest and most intuitive strategy to forecast a time series multiple steps ahead. At the same time, it is well-known that this strategy suffers from the accumulation of errors as long as the forecasting horizon increases. We propose a variant of the Recursive strategy, called RECNOISY, which perturbs the initial dataset at each step of the forecasting process in order to i) handle more properly the estimated values at each forecasting step and ii) decrease the accumulation of errors induced by the Recursive strategy. In addition to the RECNOISY strategy, we propose another strategy, called HYBRID, which for each horizon selects the most accurate approach among the REC and the RECNOISY strategies according to the estimated accuracy. In order to assess the effectiveness of the proposed strategies, we carry out an experimental session based on the 111 times series of the NN5 forecasting competition. Accuracy results are presented together with a paired comparison over the horizons and the time series. The preliminary results show that our proposed approaches are promising in terms of forecasting performance.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117248861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zekai J. Gao, Yangqiu Song, Shixia Liu, Haixun Wang, Hao Wei, Yang Chen, Weiwei Cui
{"title":"Tracking and Connecting Topics via Incremental Hierarchical Dirichlet Processes","authors":"Zekai J. Gao, Yangqiu Song, Shixia Liu, Haixun Wang, Hao Wei, Yang Chen, Weiwei Cui","doi":"10.1109/ICDM.2011.148","DOIUrl":"https://doi.org/10.1109/ICDM.2011.148","url":null,"abstract":"Much research has been devoted to topic detection from text, but one major challenge has not been addressed: revealing the rich relationships that exist among the detected topics. Finding such relationships is important since many applications are interested in how topics come into being, how they develop, grow, disintegrate, and finally disappear. In this paper, we present a novel method that reveals the connections between topics discovered from the text data. Specifically, our method focuses on how one topic splits into multiple topics, and how multiple topics merge into one topic. We adopt the hierarchical Dirichlet process (HDP) model, and propose an incremental Gibbs sampling algorithm to incrementally derive and refine the labels of clusters. We then characterize the splitting and merging patterns among clusters based on how labels change. We propose a global analysis process that focuses on cluster splitting and merging, and a finer granularity analysis process that helps users to better understand the content of the clusters and the evolution patterns. We also develop a visualization process to present the results.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125316466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinoh Oh, Sun Park, Hwanjo Yu, Min Song, Seung-Taek Park
{"title":"Novel Recommendation Based on Personal Popularity Tendency","authors":"Jinoh Oh, Sun Park, Hwanjo Yu, Min Song, Seung-Taek Park","doi":"10.1109/ICDM.2011.110","DOIUrl":"https://doi.org/10.1109/ICDM.2011.110","url":null,"abstract":"Recently, novel recommender systems have attracted considerable attention in the research community. Recommending popular items may not always satisfy users. For example, although most users likely prefer popular items, such items are often not very surprising or novel because users may already know about the items. Also, such recommender systems hardly satisfy a group of users who prefer relatively obscure items. Existing novel recommender systems, however, still recommend mainly popular items or degrade the quality of recommendation. They do so because they do not consider the balance between novelty and preference-based recommendation. This paper proposes an efficient novel-recommendation method called Personal Popularity Tendency Matching (PPTM) which recommends novel items by considering an individual's Personal Popularity Tendency (or PPT). Considering PPT helps to diversify recommendations by reasonably penalizing popular items while improving the recommendation accuracy. We experimentally show that the proposed method, PPTM, is better than other methods in terms of both novelty and accuracy.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121672193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Taxi Driving Fraud Detection System","authors":"Yong Ge, Hui Xiong, Chuanren Liu, Zhi-Hua Zhou","doi":"10.1109/ICDM.2011.18","DOIUrl":"https://doi.org/10.1109/ICDM.2011.18","url":null,"abstract":"Advances in GPS tracking technology have enabled us to install GPS tracking devices in city taxis to collect a large amount of GPS traces under operational time constraints. These GPS traces provide unparallel opportunities for us to uncover taxi driving fraud activities. In this paper, we develop a taxi driving fraud detection system, which is able to systematically investigate taxi driving fraud. In this system, we first provide functions to find two aspects of evidences: travel route evidence and driving distance evidence. Furthermore, a third function is designed to combine the two aspects of evidences based on Dempster-Shafer theory. To implement the system, we first identify interesting sites from a large amount of taxi GPS logs. Then, we propose a parameter-free method to mine the travel route evidences. Also, we introduce route mark to represent a typical driving path from an interesting site to another one. Based on route mark, we exploit a generative statistical model to characterize the distribution of driving distance and identify the driving distance evidences. Finally, we evaluate the taxi driving fraud detection system with large scale real-world taxi GPS logs. In the experiments, we uncover some regularity of driving fraud activities and investigate the motivation of drivers to commit a driving fraud by analyzing the produced taxi fraud data.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114686055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Beyond 'Caveman Communities': Hubs and Spokes for Graph Compression and Mining","authors":"U. Kang, C. Faloutsos","doi":"10.1109/ICDM.2011.26","DOIUrl":"https://doi.org/10.1109/ICDM.2011.26","url":null,"abstract":"Given a real world graph, how should we lay-out its edges? How can we compress it? These questions are closely related, and the typical approach so far is to find clique-like communities, like the `cavemen graph', and compress them. We show that the block-diagonal mental image of the `cavemen graph' is the wrong paradigm, in full agreement with earlier results that real world graphs have no good cuts. Instead, we propose to envision graphs as a collection of hubs connecting spokes, with super-hubs connecting the hubs, and so on, recursively. Based on the idea, we propose the Slash Burn method (burn the hubs, and slash the remaining graph into smaller connected components). Our view point has several advantages: (a) it avoids the `no good cuts' problem, (b) it gives better compression, and (c) it leads to faster execution times for matrix-vector operations, which are the back-bone of most graph processing tools. Experimental results show that our Slash Burn method consistently outperforms other methods on all datasets, giving good compression and faster running time.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128575512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ranking Web-Based Partial Orders by Significance Using a Markov Reference Model","authors":"Michel Speiser, G. Antonini, A. Labbi","doi":"10.1109/ICDM.2011.122","DOIUrl":"https://doi.org/10.1109/ICDM.2011.122","url":null,"abstract":"Mining web traffic data has been addressed in literature mostly using sequential pattern mining techniques. Recently, a more powerful pattern called partial order was introduced, with the hope of providing a more compact result set. A further approach towards this goal, valid for both sequential patterns and partial orders, consists in building a statistical significance test for frequent patterns. Our method is based on probabilistic generative models and provides a direct way to rank the extracted patterns. It leaves open the number of patterns of interest, which depends on the application, but provides an alternative criterion to frequency of occurrence: statistical significance. In this paper, we focus on the construction of an algorithm which calculates the probability of partial orders under a first-order Markov reference model, and we show how to use those probabilities to assess the statistical significance of a set of mined partial orders.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"07 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128948073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}