Seventh IEEE International Conference on Data Mining (ICDM 2007)最新文献

筛选
英文 中文
Parallel Mining of Frequent Closed Patterns: Harnessing Modern Computer Architectures 频繁封闭模式的并行挖掘:利用现代计算机体系结构
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.13
C. Lucchese, S. Orlando, R. Perego
{"title":"Parallel Mining of Frequent Closed Patterns: Harnessing Modern Computer Architectures","authors":"C. Lucchese, S. Orlando, R. Perego","doi":"10.1109/ICDM.2007.13","DOIUrl":"https://doi.org/10.1109/ICDM.2007.13","url":null,"abstract":"Inspired by emerging multi-core computer architectures, in this paper we present MT_CLOSED, a multi-threaded algorithm for frequent closed itemset mining (FCIM). To the best of our knowledge, this is the first FCIM parallel algorithm proposed so far. We studied how different duplicate checking techniques, typical of FCIM algorithms, may affect this parallelization. We showed that only one of them allows to decompose the global FCIM problem into independent tasks that can be executed in any order, and thus in parallel. Finally we show how MT_Closed efficiently harness modern CPUs. We designed and tested several parallelization paradigms by investigating static/dynamic decomposition and scheduling of tasks, thus showing its scalability w.r.t. to the number of CPUs. We analyzed the cache friendliness of the algorithm. Finally, we provided additional speed-up by introducing SIMD extensions.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128105492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
A Text Classification Framework with a Local Feature Ranking for Learning Social Networks 基于局部特征排序的学习社交网络文本分类框架
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.26
M. Makrehchi, M. Kamel
{"title":"A Text Classification Framework with a Local Feature Ranking for Learning Social Networks","authors":"M. Makrehchi, M. Kamel","doi":"10.1109/ICDM.2007.26","DOIUrl":"https://doi.org/10.1109/ICDM.2007.26","url":null,"abstract":"In this paper, a text classifier framework with a feature ranking scheme is proposed to extract social structures from text data. It is assumed that only a small subset of relations between the individuals in a community is known. With this assumption, the social network extraction is translated into a classification problem. The relations between two individuals are represented by merging their document vectors and the given relations are used as labels of training data. By this transformation, a text classifier such as Rocchio is used for learning the unknown relations. We show that there is a link between the intrinsic sparsity of social networks and class imbalance. Furthermore, we show that feature ranking methods usually fail in problem with unbalanced data. In order to deal with this deficiency and re-balance the unbalanced social data, a local feature ranking method, which is called reverse discrimination, is proposed.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"1926 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128016575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Lazy Bagging for Classifying Imbalanced Data 用于不平衡数据分类的Lazy Bagging
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.95
Xingquan Zhu
{"title":"Lazy Bagging for Classifying Imbalanced Data","authors":"Xingquan Zhu","doi":"10.1109/ICDM.2007.95","DOIUrl":"https://doi.org/10.1109/ICDM.2007.95","url":null,"abstract":"In this paper, we propose a lazy bagging (LB) design, which builds bootstrap replicate bags based on the characteristics of the test instances. Upon receiving a test instance Ik, LB will trim bootstrap bags by taking Ik's nearest neighbors in the training set into consideration. Our hypothesis is that an unlabeled instance's nearest neighbors provide valuable information for learners to refine their local decision boundaries for classifying this instance. By taking full advantage of Ik's nearest neighbors, the base learners are able to receive less bias and variance in classifying Ik. This strategy is beneficial for classifying imbalanced data because refining local decision boundaries can help a learner reduce its inherent bias towards the majority class and improve its performance on minority class examples. Our experimental results will confirm that LB outperforms C4.5 and TB in terms of reducing classification error, and most importantly this error reduction is largely contributed from LB's improvement on minority class examples.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131188855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights 联合导出邻域插值权值的可扩展协同滤波
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.90
Robert M. Bell, Y. Koren
{"title":"Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights","authors":"Robert M. Bell, Y. Koren","doi":"10.1109/ICDM.2007.90","DOIUrl":"https://doi.org/10.1109/ICDM.2007.90","url":null,"abstract":"Recommender systems based on collaborative filtering predict user preferences for products or services by learning past user-item relationships. A predominant approach to collaborative filtering is neighborhood based (\"k-nearest neighbors\"), where a user-item preference rating is interpolated from ratings of similar items and/or users. We enhance the neighborhood-based approach leading to substantial improvement of prediction accuracy, without a meaningful increase in running time. First, we remove certain so-called \"global effects\" from the data to make the ratings more comparable, thereby improving interpolation accuracy. Second, we show how to simultaneously derive interpolation weights for all nearest neighbors, unlike previous approaches where each weight is computed separately. By globally solving a suitable optimization problem, this simultaneous interpolation accounts for the many interactions between neighbors leading to improved accuracy. Our method is very fast in practice, generating a prediction in about 0.2 milliseconds. Importantly, it does not require training many parameters or a lengthy preprocessing, making it very practical for large scale applications. Finally, we show how to apply these methods to the perceivably much slower user-oriented approach. To this end, we suggest a novel scheme for low dimensional embedding of the users. We evaluate these methods on the netflix dataset, where they deliver significantly better results than the commercial netflix cinematch recommender system.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132479713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 574
Connections between Mining Frequent Itemsets and Learning Generative Models 挖掘频繁项集与学习生成模型之间的联系
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.83
S. Laxman, P. Naldurg, Raja Sripada, R. Venkatesan
{"title":"Connections between Mining Frequent Itemsets and Learning Generative Models","authors":"S. Laxman, P. Naldurg, Raja Sripada, R. Venkatesan","doi":"10.1109/ICDM.2007.83","DOIUrl":"https://doi.org/10.1109/ICDM.2007.83","url":null,"abstract":"Frequent itemsets mining is a popular framework for pattern discovery. In this framework, given a database of customer transactions, the task is to unearth all patterns in the form of sets of items appearing in a sizable number of transactions. We present a class of models called Itemset Generating Models (or IGMs) that can be used to formally connect the process of frequent item- sets discovery with the learning of generative models. IGMs are specified using simple probability mass functions (over the space of transactions), peaked at specific sets of items and uniform everywhere else. Under such a connection, it is possible to rigorously associate higher frequency patterns with generative models that have greater data likelihoods. This enables a generative model-learning interpretation of frequent itemsets mining. More importantly, it facilitates a statistical significance test which prescribes the minimum frequency needed for a pattern to be considered interesting. We illustrate the effectiveness of our analysis through experiments on standard benchmark data sets.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132799999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Bandit-Based Algorithms for Budgeted Learning 基于强盗的预算学习算法
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.91
Kun Deng, Chris Bourke, S. Scott, Julie Sunderman, Yaling Zheng
{"title":"Bandit-Based Algorithms for Budgeted Learning","authors":"Kun Deng, Chris Bourke, S. Scott, Julie Sunderman, Yaling Zheng","doi":"10.1109/ICDM.2007.91","DOIUrl":"https://doi.org/10.1109/ICDM.2007.91","url":null,"abstract":"We explore the problem of budgeted machine learning, in which the learning algorithm has free access to the training examples' labels but has to pay for each attribute that is specified. This learning model is appropriate in many areas, including medical applications. We present new algorithms for choosing which attributes to purchase of which examples in the budgeted learning model based on algorithms for the multi-armed bandit problem. All of our approaches outperformed the current state of the art. Furthermore, we present a new means for selecting an example to purchase after the attribute is selected, instead of selecting an example uniformly at random, which is typically done. Our new example selection method improved performance of all the algorithms we tested, both ours and those in the literature.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129322578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Noise Modeling with Associative Corruption Rules 基于关联腐败规则的噪声建模
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.28
Yan Zhang, Xindong Wu
{"title":"Noise Modeling with Associative Corruption Rules","authors":"Yan Zhang, Xindong Wu","doi":"10.1109/ICDM.2007.28","DOIUrl":"https://doi.org/10.1109/ICDM.2007.28","url":null,"abstract":"This paper presents an active learning approach to the problem of systematic noise inference and noise elimination, specifically the inference of Associated Corruption (AC) rules. AC rules are defined to simulate a common noise formation process in real-world data, in which the occurrence of an error on one attribute is dependent on several other attribute values. Our approach consists of two algorithms, Associative Corruption Forward (ACF) and Associative Corruption Backward (ACB). Algorithm ACF is proposed for noise inference, and ACB is designed for noise elimination. The experimental results show that the ACF algorithm can infer the noise formation correctly, and ACB indeed enhances the data quality for supervised learning.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129272604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Spectral Regression: A Unified Approach for Sparse Subspace Learning 谱回归:稀疏子空间学习的统一方法
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.89
Deng Cai, Xiaofei He, Jiawei Han
{"title":"Spectral Regression: A Unified Approach for Sparse Subspace Learning","authors":"Deng Cai, Xiaofei He, Jiawei Han","doi":"10.1109/ICDM.2007.89","DOIUrl":"https://doi.org/10.1109/ICDM.2007.89","url":null,"abstract":"Recently the problem of dimensionality reduction (or, subspace learning) has received a lot of interests in many fields of information processing, including data mining, information retrieval, and pattern recognition. Some popular methods include principal component analysis (PCA), linear discriminant analysis (LDA) and locality preserving projection (LPP). However, a disadvantage of all these approaches is that the learned projective functions are linear combinations of all the original features, thus it is often difficult to interpret the results. In this paper, we propose a novel dimensionality reduction framework, called Unified Sparse Subspace Learning (USSL), for learning sparse projections. USSL casts the problem of learning the projective functions into a regression framework, which facilitates the use of different kinds of regularizes. By using a L1-norm regularizer (lasso), the sparse projections can be efficiently computed. Experimental results on real world classification and clustering problems demonstrate the effectiveness of our method.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133661530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 209
Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval 主题N-Grams:短语和主题发现及其在信息检索中的应用
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.86
Xuerui Wang, A. McCallum, Xing Wei
{"title":"Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval","authors":"Xuerui Wang, A. McCallum, Xing Wei","doi":"10.1109/ICDM.2007.86","DOIUrl":"https://doi.org/10.1109/ICDM.2007.86","url":null,"abstract":"Most topic models, such as latent Dirichlet allocation, rely on the bag-of-words assumption. However, word order and phrases are often critical to capturing the meaning of text in many text mining tasks. This paper presents topical n-grams, a topic model that discovers topics as well as topical phrases. The probabilistic model generates words in their textual order by, for each word, first sampling a topic, then sampling its status as a unigram or bigram, and then sampling the word from a topic-specific unigram or bigram distribution. Thus our model can model \"white house\" as a special meaning phrase in the 'politics' topic, but not in the 'real estate' topic. Successive bigrams form longer phrases. We present experiments showing meaningful phrases and more interpretable topics from the NIPS data and improved information retrieval performance on a TREC collection.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129269860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 519
A Pairwise Covariance-Preserving Projection Method for Dimension Reduction 一种保协方差的两两投影降维方法
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.65
Xiaoming Liu, Zhaohui Wang, Zhilin Feng, Jinshan Tang
{"title":"A Pairwise Covariance-Preserving Projection Method for Dimension Reduction","authors":"Xiaoming Liu, Zhaohui Wang, Zhilin Feng, Jinshan Tang","doi":"10.1109/ICDM.2007.65","DOIUrl":"https://doi.org/10.1109/ICDM.2007.65","url":null,"abstract":"Dimension reduction is critical in many areas of pattern classification and machine learning and many discriminant analysis algorithms have been proposed. In this paper, a Pairwise Covariance-preserving Projection Method (PCPM) is proposed for dimension reduction. PCPM maximizes the class discrimination and also preserves approximately the pairwise class covariances. The optimization involved in PCPM can be solved directly by eigenvalues decomposition. Our theoretical and empirical analysis reveals the relationship between PCPM and Linear Discriminant Analysis (LDA), Sliced Average Variance Estimator (SAVE), Heteroscedastic Discriminant Analysis (HDA) and Covariance preserving Projection Method (CPM). PCPM can utilize class mean and class covariance information at the same time. Furthermore, pairwise weight scheme can be incorporated naturally with the pairwise summarization form. The proposed methods are evaluated by both synthetic and real-world datasets.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"69 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113987463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信