2011 IEEE 11th International Conference on Data Mining最新文献_第2页

On Generating All Optimal Monotone Classifications 关于生成所有最优单调分类的问题

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.111

Luite Stegeman, A. Feelders

引用次数: 4

Partitionable Kernels for Mapping Kernels 用于映射核的可分区核

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.115

Kilho Shin

{"title":"Partitionable Kernels for Mapping Kernels","authors":"Kilho Shin","doi":"10.1109/ICDM.2011.115","DOIUrl":"https://doi.org/10.1109/ICDM.2011.115","url":null,"abstract":"Many of tree kernels in the literature are designed tanking advantage of the mapping kernel framework. The most important advantage of using this framework is that we have a strong theorem to examine positive definiteness of the resulting tree kernels. In the mapping kernel framework, each data object is viewed as a collection of components, and a mapping kernel for a pair of data objects is determined as a sum of kernel values of component pairs over a certain range determined according to the purpose of use of the resulting mapping kernel. For those tree kernels known to belong to the mapping kernel category, the string kernel of the product type is commonly used to compute the kernel values of component pairs. This is because it is known that use of the product-type string kernel together with the mapping kernel framework allows us to have recursive formulas to calculate the resulting tree kernels efficiently. We significantly generalizes this result. In fact, we show that we can use partition able kernels, a new class of string kernels instead of the product-type string kernel to enjoy the same advantage, that is, efficient computation based on recursive formulas. The class of partition able kernels is abundant, and contains the product-type string kernels just as an instance. Also, this result, not limited to tree kernels, can be applied to general mapping kernels after we formalize the decomposition properties of trees as the new notion of pretty decomposability.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114156919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Mining Dominant Patterns in the Sky 挖掘天空中的主导模式

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.100

Arnaud Soulet, Chedy Raïssi, M. Plantevit, B. Crémilleux

{"title":"Mining Dominant Patterns in the Sky","authors":"Arnaud Soulet, Chedy Raïssi, M. Plantevit, B. Crémilleux","doi":"10.1109/ICDM.2011.100","DOIUrl":"https://doi.org/10.1109/ICDM.2011.100","url":null,"abstract":"Pattern discovery is at the core of numerous data mining tasks. Although many methods focus on efficiency in pattern mining, they still suffer from the problem of choosing a threshold that influences the final extraction result. The goal of our study is to make the results of pattern mining useful from a user-preference point of view. To this end, we integrate into the pattern discovery process the idea of skyline queries in order to mine skyline patterns in a threshold-free manner. Because the skyline patterns satisfy a formal property of dominations, they not only have a global interest but also have semantics that are easily understood by the user. In this work, we first establish theoretical relationships between pattern condensed representations and skyline pattern mining. We also show that it is possible to compute automatically a subset of measures involved in the user query which allows the patterns to be condensed and thus facilitates the computation of the skyline patterns. This forms the basis for a novel approach to mining skyline patterns. We illustrate the efficiency of our approach over several data sets including a use case from chemo informatics and show that small sets of dominant patterns are produced under various measures.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"233 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121142313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 73

Improving Product Classification Using Images 利用图像改进产品分类

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.79

A. Kannan, P. Talukdar, Nikhil Rasiwasia, Qifa Ke

{"title":"Improving Product Classification Using Images","authors":"A. Kannan, P. Talukdar, Nikhil Rasiwasia, Qifa Ke","doi":"10.1109/ICDM.2011.79","DOIUrl":"https://doi.org/10.1109/ICDM.2011.79","url":null,"abstract":"Product classification in Commerce search (eg{} Google Product Search, Bing Shopping) involves associating categories to offers of products from a large number of merchants. The categorized offers are used in many tasks including product taxonomy browsing and matching merchant offers to products in the catalog. Hence, learning a product classifier with high precision and recall is of fundamental importance in order to provide high quality shopping experience. A product offer typically consists of a short textual description and an image depicting the product. Traditional approaches to this classification task is to learn a classifier using only the textual descriptions of the products. In this paper, we show that the use of images, a weaker signal in our setting, in conjunction with the textual descriptions, a more discriminative signal, can considerably improve the precision of the classification task, irrespective of the type of classifier being used. We present a novel classification approach, Cross Adapt{} (CrossAdaptAcro{}), that is cognizant of the disparity in the discriminative power of different types of signals and hence makes use of the confusion matrix of dominant signal (text in our setting) to prudently leverage the weaker signal (image), for an improved performance. Our evaluation performed on data from a major Commerce search engine's catalog shows a 12% (absolute) improvement in precision at 100% coverage, and a 16% (absolute) improvement in recall at 90% precision compared to classifiers that only use textual description of products. In addition, CrossAdaptAcro{} also provides a more accurate classifier based only on the dominant signal (text) that can be used in situations in which only the dominant signal is available during application time.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115967844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Learning from Negative Examples in Set-Expansion 集展开中的反例学习

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.86

Prateek Jindal, D. Roth

引用次数: 11

A Robust Clustering Algorithm Based on Aggregated Heat Kernel Mapping 一种基于聚合热核映射的鲁棒聚类算法

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.15

Hao Huang, Shinjae Yoo, Hong Qin, Dantong Yu

引用次数: 18

Using Frequent Closed Itemsets for Data Dimensionality Reduction 使用频繁闭项集进行数据降维

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.154

Petr Krajča, Jan Outrata, Vilém Vychodil

引用次数: 11

Discovery of Versatile Temporal Subspace Patterns in 3-D Datasets 三维数据集中多用途时间子空间模式的发现

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.56

Zhen Hu, R. Bhatnagar

引用次数: 1

Semi-supervised Hierarchical Clustering 半监督分层聚类

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.130

Li Zheng, Tao Li

{"title":"Semi-supervised Hierarchical Clustering","authors":"Li Zheng, Tao Li","doi":"10.1109/ICDM.2011.130","DOIUrl":"https://doi.org/10.1109/ICDM.2011.130","url":null,"abstract":"Semi-supervised clustering (i.e., clustering with knowledge-based constraints) has emerged as an important variant of the traditional clustering paradigms. However, most existing semi-supervised clustering algorithms are designed for partitional clustering methods and few research efforts have been reported on semi-supervised hierarchical clustering methods. In addition, current semi-supervised clustering methods have been focused on the use of background information in the form of instance level must-link and cannot-link constraints, which are not suitable for hierarchical clustering where data objects are linked over different hierarchy levels. In this paper, we propose a novel semi-supervised hierarchical clustering framework based on ultra-metric dendrogram distance. The proposed framework is able to incorporate triple-wise relative constraints. We establish the connection between hierarchical clustering and ultra-metric transformation of dissimilarity matrix and propose two techniques (the constrained optimization technique and the transitive dissimilarity based technique) for semi-supervised hierarchical clustering. Experimental results demonstrate the effectiveness and the efficiency of our proposed methods.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131825394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 67

Cross-Temporal Link Prediction 跨时间链接预测

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.45

S. Oyama, K. Hayashi, H. Kashima

{"title":"Cross-Temporal Link Prediction","authors":"S. Oyama, K. Hayashi, H. Kashima","doi":"10.1109/ICDM.2011.45","DOIUrl":"https://doi.org/10.1109/ICDM.2011.45","url":null,"abstract":"The increasing interest in dynamically changing networks has led to growing interest in a more general link prediction problem called temporal link prediction in the data mining and machine learning communities. However, only links in identical time frames are considered in temporal link prediction. We propose a new link prediction problem called cross-temporal link prediction in which the links among nodes in different time frames are inferred. A typical example of cross-temporal link prediction is cross-temporal entity resolution to determine the identity of real entities represented by data objects observed in different time periods. In dynamic environments, the features of data change over time, making it difficult to identify cross-temporal links by directly comparing observed data. Other examples of cross-temporal links are asynchronous communications in social networks such as Face book and Twitter, where a message is posted in reply to a previous message. We adopt a dimension reduction approach to cross-temporal link prediction, that is, data objects in different time frames are mapped into a common low-dimensional latent feature space, and the links are identified on the basis of the distance between the data objects. The proposed method uses different low-dimensional feature projections in different time frames, enabling it to adapt to changes in the latent features over time. Using multi-task learning, it jointly learns a set of feature projection matrices from the training data, given the assumption of temporal smoothness of the projections. The optimal solutions are obtained by solving a single generalized eigenvalue problem. Experiments using a real-world set of bibliographic data for cross-temporal entity resolution showed that introducing time-dependent feature projections improves the accuracy of link prediction.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131719765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 42