2011 IEEE 11th International Conference on Data Mining最新文献

筛选
英文 中文
On Generating All Optimal Monotone Classifications 关于生成所有最优单调分类的问题
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.111
Luite Stegeman, A. Feelders
{"title":"On Generating All Optimal Monotone Classifications","authors":"Luite Stegeman, A. Feelders","doi":"10.1109/ICDM.2011.111","DOIUrl":"https://doi.org/10.1109/ICDM.2011.111","url":null,"abstract":"In many applications of data mining one knows beforehand that the response variable should be monotone (either increasing or decreasing) in the attributes. In ordinal classification, changing the class labels of a data set (relabeling) so that the data becomes monotone, is useful for at least two reasons. Firstly, models trained on relabeled data tend to have better predictive performance than models trained on the original data. Secondly, relabeling is an important building block for the construction of monotone classifiers. However, optimal monotone relabelings are rarely unique, and so far an efficient algorithm to generate them all has been lacking. The main result of this paper is an efficient algorithm to produce the structure of all optimal monotone relabelings. We also show that counting the solutions is #P-complete and give algorithms for efficiently enumerating all solutions, as well as sampling uniformly from the set of solutions. Experiments show that relabeling non-monotone data can improve the predictive performance of models trained on that data.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"101 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126275279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Partitionable Kernels for Mapping Kernels 用于映射核的可分区核
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.115
Kilho Shin
{"title":"Partitionable Kernels for Mapping Kernels","authors":"Kilho Shin","doi":"10.1109/ICDM.2011.115","DOIUrl":"https://doi.org/10.1109/ICDM.2011.115","url":null,"abstract":"Many of tree kernels in the literature are designed tanking advantage of the mapping kernel framework. The most important advantage of using this framework is that we have a strong theorem to examine positive definiteness of the resulting tree kernels. In the mapping kernel framework, each data object is viewed as a collection of components, and a mapping kernel for a pair of data objects is determined as a sum of kernel values of component pairs over a certain range determined according to the purpose of use of the resulting mapping kernel. For those tree kernels known to belong to the mapping kernel category, the string kernel of the product type is commonly used to compute the kernel values of component pairs. This is because it is known that use of the product-type string kernel together with the mapping kernel framework allows us to have recursive formulas to calculate the resulting tree kernels efficiently. We significantly generalizes this result. In fact, we show that we can use partition able kernels, a new class of string kernels instead of the product-type string kernel to enjoy the same advantage, that is, efficient computation based on recursive formulas. The class of partition able kernels is abundant, and contains the product-type string kernels just as an instance. Also, this result, not limited to tree kernels, can be applied to general mapping kernels after we formalize the decomposition properties of trees as the new notion of pretty decomposability.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114156919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Mining Dominant Patterns in the Sky 挖掘天空中的主导模式
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.100
Arnaud Soulet, Chedy Raïssi, M. Plantevit, B. Crémilleux
{"title":"Mining Dominant Patterns in the Sky","authors":"Arnaud Soulet, Chedy Raïssi, M. Plantevit, B. Crémilleux","doi":"10.1109/ICDM.2011.100","DOIUrl":"https://doi.org/10.1109/ICDM.2011.100","url":null,"abstract":"Pattern discovery is at the core of numerous data mining tasks. Although many methods focus on efficiency in pattern mining, they still suffer from the problem of choosing a threshold that influences the final extraction result. The goal of our study is to make the results of pattern mining useful from a user-preference point of view. To this end, we integrate into the pattern discovery process the idea of skyline queries in order to mine skyline patterns in a threshold-free manner. Because the skyline patterns satisfy a formal property of dominations, they not only have a global interest but also have semantics that are easily understood by the user. In this work, we first establish theoretical relationships between pattern condensed representations and skyline pattern mining. We also show that it is possible to compute automatically a subset of measures involved in the user query which allows the patterns to be condensed and thus facilitates the computation of the skyline patterns. This forms the basis for a novel approach to mining skyline patterns. We illustrate the efficiency of our approach over several data sets including a use case from chemo informatics and show that small sets of dominant patterns are produced under various measures.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"233 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121142313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 73
Improving Product Classification Using Images 利用图像改进产品分类
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.79
A. Kannan, P. Talukdar, Nikhil Rasiwasia, Qifa Ke
{"title":"Improving Product Classification Using Images","authors":"A. Kannan, P. Talukdar, Nikhil Rasiwasia, Qifa Ke","doi":"10.1109/ICDM.2011.79","DOIUrl":"https://doi.org/10.1109/ICDM.2011.79","url":null,"abstract":"Product classification in Commerce search (eg{} Google Product Search, Bing Shopping) involves associating categories to offers of products from a large number of merchants. The categorized offers are used in many tasks including product taxonomy browsing and matching merchant offers to products in the catalog. Hence, learning a product classifier with high precision and recall is of fundamental importance in order to provide high quality shopping experience. A product offer typically consists of a short textual description and an image depicting the product. Traditional approaches to this classification task is to learn a classifier using only the textual descriptions of the products. In this paper, we show that the use of images, a weaker signal in our setting, in conjunction with the textual descriptions, a more discriminative signal, can considerably improve the precision of the classification task, irrespective of the type of classifier being used. We present a novel classification approach, Cross Adapt{} (CrossAdaptAcro{}), that is cognizant of the disparity in the discriminative power of different types of signals and hence makes use of the confusion matrix of dominant signal (text in our setting) to prudently leverage the weaker signal (image), for an improved performance. Our evaluation performed on data from a major Commerce search engine's catalog shows a 12% (absolute) improvement in precision at 100% coverage, and a 16% (absolute) improvement in recall at 90% precision compared to classifiers that only use textual description of products. In addition, CrossAdaptAcro{} also provides a more accurate classifier based only on the dominant signal (text) that can be used in situations in which only the dominant signal is available during application time.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115967844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Learning from Negative Examples in Set-Expansion 集展开中的反例学习
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.86
Prateek Jindal, D. Roth
{"title":"Learning from Negative Examples in Set-Expansion","authors":"Prateek Jindal, D. Roth","doi":"10.1109/ICDM.2011.86","DOIUrl":"https://doi.org/10.1109/ICDM.2011.86","url":null,"abstract":"This paper addresses the task of set-expansion on free text. Set-expansion has been viewed as a problem of generating an extensive list of instances of a concept of interest, given a few examples of the concept as input. Our key contribution is that we show that the concept definition can be significantly improved by specifying some negative examples in the input, along with the positive examples. The state-of-the art centroid-based approach to set-expansion doesn't readily admit the negative examples. We develop an inference-based approach to set-expansion which naturally allows for negative examples and show that it performs significantly better than a strong baseline.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123881243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
A Robust Clustering Algorithm Based on Aggregated Heat Kernel Mapping 一种基于聚合热核映射的鲁棒聚类算法
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.15
Hao Huang, Shinjae Yoo, Hong Qin, Dantong Yu
{"title":"A Robust Clustering Algorithm Based on Aggregated Heat Kernel Mapping","authors":"Hao Huang, Shinjae Yoo, Hong Qin, Dantong Yu","doi":"10.1109/ICDM.2011.15","DOIUrl":"https://doi.org/10.1109/ICDM.2011.15","url":null,"abstract":"Current spectral clustering algorithms suffer from both sensitivity to scaling parameter selection in similarity matrix construction, and data perturbation. This paper aims to improve robustness in clustering algorithms and combat these two limitations based on heat kernel theory. Heat kernel can statistically depict traces of random walk, so it has an intrinsic connection with diffusion distance, with which we can ensure robustness during any clustering process. By integrating heat distributed along time scale, we propose a novel method called Aggregated Heat Kernel (AHK) to measure the distance between each point pair in their eigen space. Using AHK and Laplace-Beltrami Normalization (LBN) we are able to apply an advanced noise-resisting robust spectral mapping to original dataset. Moreover it offers stability on scaling parameter tuning. Experimental results show that, compared to other popular spectral clustering methods, our algorithm can achieve robust clustering results on both synthetic and UCI real datasets.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131175383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Using Frequent Closed Itemsets for Data Dimensionality Reduction 使用频繁闭项集进行数据降维
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.154
Petr Krajča, Jan Outrata, Vilém Vychodil
{"title":"Using Frequent Closed Itemsets for Data Dimensionality Reduction","authors":"Petr Krajča, Jan Outrata, Vilém Vychodil","doi":"10.1109/ICDM.2011.154","DOIUrl":"https://doi.org/10.1109/ICDM.2011.154","url":null,"abstract":"We address important issues of dimensionality reduction of transactional data sets where the input data consists of lists of transactions, each of them being a finite set of items. The reduction consists in finding a small set of new items, so-called factor-items, which is considerably smaller than the original set of items while comprising full or nearly full information about the original items. Using this type of reduction, the original data set can be represented by a smaller transactional data set using factor-items instead of the original items, thus reducing its dimensionality. The procedure utilized in this paper is based on approximate Boolean matrix decomposition. In this paper, we focus on the role of frequent closed item sets that can be used to determine factor-items. We present the factorization problem, its reduction to Boolean matrix decompositions, experiments with publicly available data sets, and an algorithm for computing decompositions.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132750470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Discovery of Versatile Temporal Subspace Patterns in 3-D Datasets 三维数据集中多用途时间子空间模式的发现
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.56
Zhen Hu, R. Bhatnagar
{"title":"Discovery of Versatile Temporal Subspace Patterns in 3-D Datasets","authors":"Zhen Hu, R. Bhatnagar","doi":"10.1109/ICDM.2011.56","DOIUrl":"https://doi.org/10.1109/ICDM.2011.56","url":null,"abstract":"Most existing methods for clustering temporal data are based on either a strict similarity metric or a precisely defined temporal profile such as a sine, exponential wave etc. Also, these methods compute similarity metric across the entire time-span of the objects. However these types of temporal patterns are more useful in many biological analysis, where it is important to observe gene expression pattens across arbitrary subintervals. These types of temporal patterns are very useful in bioinformatics, where it is important to observe gene expression pattens across arbitrary subintervals. In this paper we present an algorithm for searching for multiple contiguous temporal subintervals within which the selected objects demonstrate existence of clear patterns. We demonstrate the power and advantages of our algorithm by using a synthetic dataset and a pharmacokinetics dataset for which other researchers have recently published their results. We compare and contrast our results with these results to show superiority of our approach.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134012651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Semi-supervised Hierarchical Clustering 半监督分层聚类
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.130
Li Zheng, Tao Li
{"title":"Semi-supervised Hierarchical Clustering","authors":"Li Zheng, Tao Li","doi":"10.1109/ICDM.2011.130","DOIUrl":"https://doi.org/10.1109/ICDM.2011.130","url":null,"abstract":"Semi-supervised clustering (i.e., clustering with knowledge-based constraints) has emerged as an important variant of the traditional clustering paradigms. However, most existing semi-supervised clustering algorithms are designed for partitional clustering methods and few research efforts have been reported on semi-supervised hierarchical clustering methods. In addition, current semi-supervised clustering methods have been focused on the use of background information in the form of instance level must-link and cannot-link constraints, which are not suitable for hierarchical clustering where data objects are linked over different hierarchy levels. In this paper, we propose a novel semi-supervised hierarchical clustering framework based on ultra-metric dendrogram distance. The proposed framework is able to incorporate triple-wise relative constraints. We establish the connection between hierarchical clustering and ultra-metric transformation of dissimilarity matrix and propose two techniques (the constrained optimization technique and the transitive dissimilarity based technique) for semi-supervised hierarchical clustering. Experimental results demonstrate the effectiveness and the efficiency of our proposed methods.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131825394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
Cross-Temporal Link Prediction 跨时间链接预测
2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI: 10.1109/ICDM.2011.45
S. Oyama, K. Hayashi, H. Kashima
{"title":"Cross-Temporal Link Prediction","authors":"S. Oyama, K. Hayashi, H. Kashima","doi":"10.1109/ICDM.2011.45","DOIUrl":"https://doi.org/10.1109/ICDM.2011.45","url":null,"abstract":"The increasing interest in dynamically changing networks has led to growing interest in a more general link prediction problem called temporal link prediction in the data mining and machine learning communities. However, only links in identical time frames are considered in temporal link prediction. We propose a new link prediction problem called cross-temporal link prediction in which the links among nodes in different time frames are inferred. A typical example of cross-temporal link prediction is cross-temporal entity resolution to determine the identity of real entities represented by data objects observed in different time periods. In dynamic environments, the features of data change over time, making it difficult to identify cross-temporal links by directly comparing observed data. Other examples of cross-temporal links are asynchronous communications in social networks such as Face book and Twitter, where a message is posted in reply to a previous message. We adopt a dimension reduction approach to cross-temporal link prediction, that is, data objects in different time frames are mapped into a common low-dimensional latent feature space, and the links are identified on the basis of the distance between the data objects. The proposed method uses different low-dimensional feature projections in different time frames, enabling it to adapt to changes in the latent features over time. Using multi-task learning, it jointly learns a set of feature projection matrices from the training data, given the assumption of temporal smoothness of the projections. The optimal solutions are obtained by solving a single generalized eigenvalue problem. Experiments using a real-world set of bibliographic data for cross-temporal entity resolution showed that introducing time-dependent feature projections improves the accuracy of link prediction.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131719765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信