{"title":"Manifold Clustering of Shapes","authors":"Dragomir Yankov, Eamonn J. Keogh","doi":"10.1109/ICDM.2006.101","DOIUrl":"https://doi.org/10.1109/ICDM.2006.101","url":null,"abstract":"Shape clustering can significantly facilitate the automatic labeling of objects present in image collections. For example, it could outline the existing groups of pathological cells in a bank of cyto-images; the groups of species on photographs collected from certain aerials; or the groups of objects observed on surveillance scenes from an office building. Here we demonstrate that a nonlinear projection algorithm such as Isomap can attract together shapes of similar objects, suggesting the existence of isometry between the shape space and a low dimensional nonlinear embedding. Whenever there is a relatively small amount of noise in the data, the projection forms compact, convex clusters that can easily be learned by a subsequent partitioning scheme. We further propose a modification of the Isomap projection based on the concept of degree-bounded minimum spanning trees. The proposed approach is demonstrated to move apart bridged clusters and to alleviate the effect of noise in the data.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116517044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying Follow-Correlation Itemset-Pairs","authors":"Shichao Zhang, Jilian Zhang, Xiaofeng Zhu, Zifang Huang","doi":"10.1109/ICDM.2006.84","DOIUrl":"https://doi.org/10.1109/ICDM.2006.84","url":null,"abstract":"An association rule ArarrB is useful to predict that B will likely occur when A occurs. This is a classical association rule. In real world applications, such as bioinformatics and medical research, there are many follow correlations between itemsets A and B: B likely occurs n times after A occurred m times, wrote to <Am, BN>. We refer to this follow-correlation as P3.1 itemset-pairs because <A3, B1> like that in the example ( Example 2) should be uninterested in association analysis. This paper designs an efficient algorithm for identifying P3.1 itemset-pairs in sequential data. We experimentally evaluate our approach, and demonstrate that the proposed approach is efficient and promising.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134132880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrating Features from Different Sources for Music Information Retrieval","authors":"Tao Li, M. Ogihara, Shenghuo Zhu","doi":"10.1109/ICDM.2006.89","DOIUrl":"https://doi.org/10.1109/ICDM.2006.89","url":null,"abstract":"Efficient and intelligent music information retrieval is a very important topic of the 21st century. With the ultimate goal of building personal music information retrieval systems, this paper studies the problem of identifying \"similar\" artists using both lyrics and acoustic data. In this paper, we present a clustering algorithm that integrates features from both sources to perform bimodal learning. The algorithm is tested on a data set consisting of 570 songs from 53 albums of 41 artists using artist similarity provided by All Music Guide. Experimental results show that the accuracy of artist similarity classifiers can be significantly improved and that artist similarity can be efficiently identified.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134002341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining Latent Associations of Objects Using a Typed Mixture Model--A Case Study on Expert/Expertise Mining","authors":"Shenghua Bao, Yunbo Cao, B. Liu, Yong Yu, Hang Li","doi":"10.1109/ICDM.2006.109","DOIUrl":"https://doi.org/10.1109/ICDM.2006.109","url":null,"abstract":"This paper studies the problem of discovering latent associations among objects in text documents. Specifically, given two sets of objects and various types of co-occurrence data concerning the objects existing in texts, we aim to discover the hidden or latent associative relationships between the two sets of objects. Existing methods are not directly applicable as they are unable to consider all this information. For example, the probabilistic mixture model called Separable Mixture Model (SMM) proposed by Hofmann can use only one type of co-occurrences to mine latent associations. This paper proposes a more general probabilistic mixture model called the Typed Separable Mixture Model (TSMM), which is able to use all types of co-occurrences within a single framework. Experimental results based on the expert/expertise mining task show that TSMM outperforms SMM significantly.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133244545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Meta Clustering","authors":"R. Caruana, M. Elhawary, Nam Nguyen, Casey Smith","doi":"10.1109/ICDM.2006.103","DOIUrl":"https://doi.org/10.1109/ICDM.2006.103","url":null,"abstract":"Clustering is ill-defined. Unlike supervised learning where labels lead to crisp performance criteria such as accuracy and squared error, clustering quality depends on how the clusters will be used. Devising clustering criteria that capture what users need is difficult. Most clustering algorithms search for optimal clusterings based on a pre-specified clustering criterion. Our approach differs. We search for many alternate clusterings of the data, and then allow users to select the clustering(s) that best fit their needs. Meta clustering first finds a variety of clusterings and then clusters this diverse set of clusterings so that users must only examine a small number of qualitatively different clusterings. We present methods for automatically generating a diverse set of alternate clusterings, as well as methods for grouping clusterings into meta clusters. We evaluate meta clustering on four test problems and two case studies. Surprisingly, clusterings that would be of most interest to users often are not very compact clusterings.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133371129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"STAGGER: Periodicity Mining of Data Streams Using Expanding Sliding Windows","authors":"Mohamed G. Elfeky, Walid G. Aref, A. Elmagarmid","doi":"10.1109/ICDM.2006.153","DOIUrl":"https://doi.org/10.1109/ICDM.2006.153","url":null,"abstract":"Sensor devices are becoming ubiquitous, especially in measurement and monitoring applications. Because of the real-time, append-only and semi-infinite natures of the generated sensor data streams, an online incremental approach is a necessity for mining stream data types. In this paper, we propose STAGGER: a one-pass, online and incremental algorithm for mining periodic patterns in data streams. STAGGER does not require that the user pre-specify the periodicity rate of the data. Instead, STAGGER discovers the potential periodicity rates. STAGGER maintains multiple expanding sliding windows staggered over the stream, where computations are shared among the multiple overlapping windows. Small-length sliding windows are imperative for early and real-time output, yet are limited to discover short periodicity rates. As streamed data arrives continuously, the sliding windows expand in length in order to cover the whole stream. Larger-length sliding windows are able to discover longer periodicity rates. STAGGER incrementally maintains a tree-like data structure for the frequent periodic patterns of each discovered potential periodicity rate. In contrast to the Fourier/Wavelet-based approaches used for discovering periodicity rates, STAGGER not only discovers a wider, more accurate set of periodicities, but also discovers the periodic patterns themselves. In fact, experimental results with real and synthetic data sets show that STAGGER outperforms Fourier/Wavelet-based approaches by an order of magnitude in terms of the accuracy of the discovered periodicity rates. Moreover, real-data experiments demonstrate the practicality of the discovered periodic patterns.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"182 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116202227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Balanced Ensemble Approach to Weighting Classifiers for Text Classification","authors":"G. Fung, J. Yu, Haixun Wang, D. Cheung, Huan Liu","doi":"10.1109/ICDM.2006.2","DOIUrl":"https://doi.org/10.1109/ICDM.2006.2","url":null,"abstract":"This paper studies the problem of constructing an effective heterogeneous ensemble classifier for text classification. One major challenge of this problem is to formulate a good combination function, which combines the decisions of the individual classifiers in the ensemble. We show that the classification performance is affected by three weight components and they should be included in deriving an effective combination function. They are: (1) Global effectiveness, which measures the effectiveness of a member classifier in classifying a set of unseen documents; (2) Local effectiveness, which measures the effectiveness of a member classifier in classifying the particular domain of an unseen document; and (3) Decision confidence, which describes how confident a classifier is when making a decision when classifying a specific unseen document. We propose a new balanced combination function, called dynamic classifier weighting (DCW), that incorporates the aforementioned three components. The empirical study demonstrates that the new combination function is highly effective for text classification.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122019707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Interactive Semantic Video Mining and Retrieval Platform--Application in Transportation Surveillance Video for Incident Detection","authors":"Xin Chen, Chengcui Zhang","doi":"10.1109/ICDM.2006.20","DOIUrl":"https://doi.org/10.1109/ICDM.2006.20","url":null,"abstract":"Understanding and retrieving videos based on their semantic contents is an important research topic in multimedia data mining and has found various real- world applications. Most existing video analysis techniques focus on the low level visual features of video data. However, there is a \"semantic gap\" between the machine-readable features and the high level human concepts i.e. human understanding of the video content. In this paper, an interactive platform for semantic video mining and retrieval is proposed using relevance feedback (RF), a popular technique in the area of content-based image retrieval (CBIR). By tracking semantic objects in a video and then modeling spatio-temporal events based on object trajectories and object interactions, the proposed interactive learning algorithm in the platform is able to mine the spatio-temporal data extracted from the video. An iterative learning process is involved in the proposed platform, which is guided by the user's response to the retrieved results. Although the proposed video retrieval platform is intended for general use and can be tailored to many applications, we focus on its application in traffic surveillance video database retrieval to demonstrate the design details. The effectiveness of the algorithm is demonstrated by our experiments on real-life traffic surveillance videos.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124838853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi-Ting Lai, Ke Wang, Daymond Ling, Hua Shi, Jason J. Zhang
{"title":"Direct Marketing When There Are Voluntary Buyers","authors":"Yi-Ting Lai, Ke Wang, Daymond Ling, Hua Shi, Jason J. Zhang","doi":"10.1109/ICDM.2006.54","DOIUrl":"https://doi.org/10.1109/ICDM.2006.54","url":null,"abstract":"In traditional direct marketing, the implicit assumption is that customers will only purchase the product if they are contacted. In real business environments, however, there are \"voluntary buyers, \" who will still make the purchase in the absence of a contact. While no direct promotion is needed for voluntary buyers, the traditional response-driven paradigm tends to target such customers. This paper presents \"influential marketing, \" targeting only those whose purchase decisions can be positively influenced, i.e. buyers who are non-voluntary. Our novel, practical solution to this problem gives promising results.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130089082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploratory Under-Sampling for Class-Imbalance Learning","authors":"Xu-Ying Liu, Jianxin Wu, Zhi-Hua Zhou","doi":"10.1109/ICDM.2006.68","DOIUrl":"https://doi.org/10.1109/ICDM.2006.68","url":null,"abstract":"Under-sampling is a class-imbalance learning method which uses only a subset of major class examples and thus is very efficient. The main deficiency is that many major class examples are ignored. We propose two algorithms to overcome the deficiency. EasyEnsemble samples several subsets from the major class, trains a learner using each of them, and combines the outputs of those learners. BalanceCascade is similar to EasyEnsemble except that it removes correctly classified major class examples of trained learners from further consideration. Experiments show that both of the proposed algorithms have better AUC scores than many existing class-imbalance learning methods. Moreover, they have approximately the same training time as that of under-sampling, which trains significantly faster than other methods.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130419350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}