James M. Kang, S. Shekhar, Christine Wennen, P. Novak
{"title":"Discovering Flow Anomalies: A SWEET Approach","authors":"James M. Kang, S. Shekhar, Christine Wennen, P. Novak","doi":"10.1109/ICDM.2008.117","DOIUrl":"https://doi.org/10.1109/ICDM.2008.117","url":null,"abstract":"Given a percentage-threshold and readings from a pair of consecutive upstream and downstream sensors, flow anomaly discovery identifies dominant time intervals where the fraction of time instants of significantly mis-matched sensor readings exceed the given percentage-threshold. Discovering flow anomalies (FA) is an important problem in environmental flow monitoring networks and early warning detection systems for water quality problems. However, mining FAs is computationally expensive because of the large (potentially infinite) number of time instants of measurement and potentially long delays due to stagnant (e.g. lakes) or slow moving (e.g. wetland) water bodies between consecutive sensors. Traditional outlier detection methods (e.g. t-test) are suited for detecting transient FAs (i.e., time instants of significant mis-matches across consecutive sensors) and cannot detect persistent FAs (i.e., long variable time-windows with a high fraction of time instant transient FAs) due to a lack of a pre-defined window size. In contrast, we propose a Smart Window Enumeration and Evaluation of persistence-Thresholds (SWEET) method to efficiently explore the search space of all possible window lengths. Computation overhead is brought down significantly by restricting the start and end points of a window to coincide with transient FAs, using a smart counter and efficient pruning techniques. Experimental evaluation using a real dataset shows our proposed approach outperforms Nainodotve alternatives.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131292643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiplicative Mixture Models for Overlapping Clustering","authors":"Qiang Fu, A. Banerjee","doi":"10.1109/ICDM.2008.103","DOIUrl":"https://doi.org/10.1109/ICDM.2008.103","url":null,"abstract":"The problem of overlapping clustering, where a point is allowed to belong to multiple clusters, is becoming increasingly important in a variety of applications. In this paper, we present an overlapping clustering algorithm based on multiplicative mixture models. We analyze a general setting where each component of the multiplicative mixture is from an exponential family, and present an efficient alternating maximization algorithm to learn the model and infer overlapping clusters. We also show that when each component is assumed to be a Gaussian, we can apply the kernel trick leading to non-linear cluster separators and obtain better clustering quality. The efficacy of the proposed algorithms is demonstrated using experiments on both UCI benchmark datasets and a microarray gene expression dataset.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115240671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Recommendation System for Preconditioned Iterative Solvers","authors":"Thomas George, Anshul Gupta, V. Sarin","doi":"10.1109/ICDM.2008.105","DOIUrl":"https://doi.org/10.1109/ICDM.2008.105","url":null,"abstract":"Preconditioned iterative methods are often used to solve very large sparse systems of linear systems that arise in many scientific and engineering applications. The performance and robustness of these solvers is extremely sensitive to the choice of multiple preconditioner and solver parameters. Users of iterative methods often encounter an overwhelming number of combinations of choices for solvers, matrix preprocessing steps, preconditioners, and their parameters. The lack of a unified theoretical analysis of preconditioners coupled with limited knowledge of their interaction with linear systems makes it highly challenging for practitioners to choose good solver configurations. In this paper, we propose a novel, multi-stage learning based methodology for determining the best solver configurations to optimize the desired performance behavior for any given linear system. Empirical results over real performance data for the hyper iterative solver package demonstrate the efficacy and flexibility of the proposed approach.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114488041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianbing Xu, Zhongfei Zhang, Philip S. Yu, Bo Long
{"title":"Evolutionary Clustering by Hierarchical Dirichlet Process with Hidden Markov State","authors":"Tianbing Xu, Zhongfei Zhang, Philip S. Yu, Bo Long","doi":"10.1109/ICDM.2008.24","DOIUrl":"https://doi.org/10.1109/ICDM.2008.24","url":null,"abstract":"This paper studies evolutionary clustering, which is a recently hot topic with many important applications, noticeably in social network analysis. In this paper, based on the recent literature on Hierarchical Dirichlet Process (HDP) and Hidden Markov Model (HMM), we have developed a statistical model HDP-HTM that combines HDP with a Hierarchical Transition Matrix (HTM) based on the proposed Infinite Hierarchical Hidden Markov State model (iH2MS) as an effective solution to this problem. The HDP-HTM model substantially advances the literature on evolutionary clustering in the sense that not only it performs better than the existing literature, but more importantly it is capable of automatically learning the cluster numbers and structures and at the same time explicitly addresses the correspondence issue during the evolution. Extensive evaluations have demonstrated the effectiveness and promise of this solution against the state-of-the-art literature.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129600164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Temporal-Relational Classifiers for Prediction in Evolving Domains","authors":"U. Sharan, Jennifer Neville","doi":"10.1109/ICDM.2008.125","DOIUrl":"https://doi.org/10.1109/ICDM.2008.125","url":null,"abstract":"Many relational domains contain temporal information and dynamics that are important to model (e.g., social networks, protein networks). However, past work in relational learning has focused primarily on modeling static \"snapshots\" of the data and has largely ignored the temporal dimension of these data. In this work, we extend relational techniques to temporally-evolving domains and outline a representational framework that is capable of modeling both temporal and relational dependencies in the data. We develop efficient learning and inference techniques within the framework by considering a restricted set of temporal-relational dependencies and using parameter-tying methods to generalize across relationships and entities. More specifically, we model dynamic relational data with a two-phase process, first summarizing the temporal-relational information with kernel smoothing, and then moderating attribute dependencies with the summarized relational information. We develop a number of novel temporal-relational models using the framework and then show that the current approaches to modeling static relational data are special cases within the framework. We compare the new models to the competing static relational methods on three real-world datasets and show that the temporal-relational models consistently outperform the relational models that ignore temporal information - achieving significant reductions in error ranging from 15% to 70%.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130521689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Robust Discriminative Term Weighting Based Linear Discriminant Method for Text Classification","authors":"K. N. Junejo, Asim Karim","doi":"10.1109/ICDM.2008.26","DOIUrl":"https://doi.org/10.1109/ICDM.2008.26","url":null,"abstract":"Text classification is widely used in applications ranging from e-mail filtering to review classification. Many of these applications demand that the classification method be efficient and robust, yet produce accurate categorizations by using the terms in the documents only. We present a supervised text classification method based on discriminative term weighting, discrimination information pooling, and linear discrimination. Terms in the documents are assigned weights according to the discrimination information they provide for one category over the others. These weights also serve to partition the terms into two sets. A linear opinion pool is adopted for combining the discrimination information provided by each set of terms yielding a two-dimensional feature space. Subsequently, a linear discriminant function is learned to categorize the documents in the feature space. We provide intuitive and empirical evidence of the robustness of our method with three term weighting strategies. Experimental results are presented for data sets from three different application areas. The results show that our method's accuracy is higher than other popular methods, especially when there is a distribution shift from training to testing sets. Moreover, our method is simple yet robust to different application domains and small training set sizes.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127940619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kaizhu Huang, Zenglin Xu, Irwin King, Michael R. Lyu
{"title":"Semi-supervised Learning from General Unlabeled Data","authors":"Kaizhu Huang, Zenglin Xu, Irwin King, Michael R. Lyu","doi":"10.1109/ICDM.2008.61","DOIUrl":"https://doi.org/10.1109/ICDM.2008.61","url":null,"abstract":"We consider the problem of semi-supervised learning (SSL) from general unlabeled data, which may contain irrelevant samples. Within the binary setting, our model manages to better utilize the information from unlabeled data by formulating them as a three-class (-1,+1, 0) mixture, where class 0 represents the irrelevant data. This distinguishes our work from the traditional SSL problem where unlabeled data are assumed to contain relevant samples only, either +1 or -1, which are forced to be the same as the given labeled samples. This work is also different from another family of popular models, universum learning (universum means \"irrelevant\" data), in that the universum need not to be specified beforehand. One significant contribution of our proposed framework is that such irrelevant samples can be automatically detected from the available unlabeled data, even though they are mixed with relevant data. This hence presents a general SSL framework that does not force \"clean\" unlabeled data.More importantly, we formulate this general learning framework as a Semi-definite Programming problem, making it solvable in polynomial time. A series of experiments demonstrate that the proposed framework can outperform the traditional SSL on both synthetic and real data.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125416407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Latent Dirichlet Allocation and Singular Value Decomposition Based Multi-document Summarization","authors":"Rachit Arora, Balaraman Ravindran","doi":"10.1109/ICDM.2008.55","DOIUrl":"https://doi.org/10.1109/ICDM.2008.55","url":null,"abstract":"Multi-Document Summarization deals with computing a summary for a set of related articles such that they give the user a general view about the events. One of the objectives is that the sentences should cover the different events in the documents with the information covered in as few sentences as possible. Latent Dirichlet Allocation can breakdown these documents into different topics or events. However to reduce the common information content the sentences of the summary need to be orthogonal to each other since orthogonal vectors have the lowest possible similarity and correlation between them. Singular Value Decompositions used to get the orthogonal representations of vectors and representing sentences as vectors, we can get the sentences that are orthogonal to each other in the LDA mixture model weighted term domain. Thus using LDA we find the different topics in the documents and using SVD we find the sentences that best represent these topics. Finally we present the evaluation of the algorithms on the DUC2002 Corpus multi-document summarization tasks using the ROUGE evaluator to evaluate the summaries. Compared to DUC 2002 winners, our algorithms gave significantly better ROUGE-1 recall measures.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123252365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Collaborative Filtering for Implicit Feedback Datasets","authors":"Yifan Hu, Y. Koren, C. Volinsky","doi":"10.1109/ICDM.2008.22","DOIUrl":"https://doi.org/10.1109/ICDM.2008.22","url":null,"abstract":"A common task of recommender systems is to improve customer experience through personalized recommendations based on prior implicit feedback. These systems passively track different sorts of user behavior, such as purchase history, watching habits and browsing activity, in order to model user preferences. Unlike the much more extensively researched explicit feedback, we do not have any direct input from the users regarding their preferences. In particular, we lack substantial evidence on which products consumer dislike. In this work we identify unique properties of implicit feedback datasets. We propose treating the data as indication of positive and negative preference associated with vastly varying confidence levels. This leads to a factor model which is especially tailored for implicit feedback recommenders. We also suggest a scalable optimization procedure, which scales linearly with the data size. The algorithm is used successfully within a recommender system for television shows. It compares favorably with well tuned implementations of other known methods. In addition, we offer a novel way to give explanations to recommendations given by this factor model.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121469043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised Face Annotation by Mining the Web","authors":"Duy-Dinh Le, S. Satoh","doi":"10.1109/ICDM.2008.47","DOIUrl":"https://doi.org/10.1109/ICDM.2008.47","url":null,"abstract":"Searching for images of people is an essential task for image and video search engines. However, current search engines have limited capabilities for this task since they rely on text associated with images and video, and such text is likely to return many irrelevant results. We propose a method for retrieving relevant faces of one person by learning the visual consistency among results retrieved from text correlation-based search engines. The method consists of two steps. In the first step, each candidate face obtained from a text-based search engine is ranked with a score that measures the distribution of visual similarities among the faces. Faces that are possibly very relevant or irrelevant are ranked at the top or bottom of the list, respectively. The second step improves this ranking by treating this problem as a classification problem in which input faces are classified as psilaperson-Xpsila or psilanon-person-Xpsila; and the faces are re-ranked according to their relevant score inferred from the classifierpsilas probability output. To train this classifier, we use a bagging-based framework to combine results from multiple weak classifiers trained using different subsets. These training subsets are extracted and labeled automatically from the rank list produced from the classifier trained from the previous step. In this way, the accuracy of the ranked list increases after a number of iterations. Experimental results on various face sets retrieved from captions of news photos show that the retrieval performance improved after each iteration, with the final performance being higher than those of the existing algorithms.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120911326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}