2010 IEEE International Conference on Data Mining最新文献

筛选
英文 中文
Compressed Nonnegative Sparse Coding 压缩非负稀疏编码
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.162
Fei Wang, Ping Li
{"title":"Compressed Nonnegative Sparse Coding","authors":"Fei Wang, Ping Li","doi":"10.1109/ICDM.2010.162","DOIUrl":"https://doi.org/10.1109/ICDM.2010.162","url":null,"abstract":"Sparse Coding (SC), which models the data vectors as sparse linear combinations over basis vectors, has been widely applied in machine learning, signal processing and neuroscience. In this paper, we propose a dual random projection method to provide an efficient solution to Nonnegative Sparse Coding (NSC) using small memory. Experiments on real world data demonstrate the effectiveness of the proposed method.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130137579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A Novel Contrast Co-learning Framework for Generating High Quality Training Data 生成高质量训练数据的新型对比协同学习框架
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.23
Zeyu Zheng, Jun Yan, Shuicheng Yan, Ning Liu, Zheng Chen, Ming Zhang
{"title":"A Novel Contrast Co-learning Framework for Generating High Quality Training Data","authors":"Zeyu Zheng, Jun Yan, Shuicheng Yan, Ning Liu, Zheng Chen, Ming Zhang","doi":"10.1109/ICDM.2010.23","DOIUrl":"https://doi.org/10.1109/ICDM.2010.23","url":null,"abstract":"The good performances of most classical learning algorithms are generally founded on high quality training data, which are clean and unbiased. The availability of such data is however becoming much harder than ever in many real world problems due to the difficulties in collecting large scale unbiased data and precisely labeling them for training. In this paper, we propose a general Contrast Co-learning (CCL) framework to refine the biased and noisy training data when an unbiased yet unlabeled data pool is available. CCL starts with multiple sets of probably biased and noisy training data and trains a set of classifiers individually. Then under the assumption that the confidently classified data samples may have higher probabilities to be correctly classified, CCL iteratively and automatically filtering out possible data noises as well as adding those confidently classified samples from the unlabeled data pool to correct the bias. Through this process, we can generate a cleaner and unbiased training dataset with theoretical guarantees. Extensive experiments on two public text datasets clearly show that CCL consistently improves the algorithmic classification performance on biased and noisy training data compared with several state-of-the-art classical algorithms.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121320572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
One-Class Matrix Completion with Low-Density Factorizations 低密度分解的一类矩阵补全
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.164
Vikas Sindhwani, S. Bucak, Jianying Hu, A. Mojsilovic
{"title":"One-Class Matrix Completion with Low-Density Factorizations","authors":"Vikas Sindhwani, S. Bucak, Jianying Hu, A. Mojsilovic","doi":"10.1109/ICDM.2010.164","DOIUrl":"https://doi.org/10.1109/ICDM.2010.164","url":null,"abstract":"Consider a typical recommendation problem. A company has historical records of products sold to a large customer base. These records may be compactly represented as a sparse customer-times-product ``who-bought-what\" binary matrix. Given this matrix, the goal is to build a model that provides recommendations for which products should be sold next to the existing customer base. Such problems may naturally be formulated as collaborative filtering tasks. However, this is a {it one-class} setting, that is, the only known entries in the matrix are one-valued. If a customer has not bought a product yet, it does not imply that the customer has a low propensity to {it potentially} be interested in that product. In the absence of entries explicitly labeled as negative examples, one may resort to considering unobserved customer-product pairs as either missing data or as surrogate negative instances. In this paper, we propose an approach to explicitly deal with this kind of ambiguity by instead treating the unobserved entries as optimization variables. These variables are optimized in conjunction with learning a weighted, low-rank non-negative matrix factorization (NMF) of the customer-product matrix, similar to how Transductive SVMs implement the low-density separation principle for semi-supervised learning. Experimental results show that our approach gives significantly better recommendations in comparison to various competing alternatives on one-class collaborative filtering tasks.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122488644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 94
Interval-valued Matrix Factorization with Applications 区间值矩阵分解及其应用
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.115
Zhiyong Shen, Liang Du, Xukun Shen, Yi-Dong Shen
{"title":"Interval-valued Matrix Factorization with Applications","authors":"Zhiyong Shen, Liang Du, Xukun Shen, Yi-Dong Shen","doi":"10.1109/ICDM.2010.115","DOIUrl":"https://doi.org/10.1109/ICDM.2010.115","url":null,"abstract":"In this paper, we propose the Interval-valued Matrix Factorization (IMF) framework. Matrix Factorization (MF) is a fundamental building block of data mining. MF techniques, such as Nonnegative Matrix Factorization (NMF) and Probabilistic Matrix Factorization (PMF), are widely used in applications of data mining. For example, NMF has shown its advantage in Face Analysis (FA) while PMF has been successfully applied to Collaborative Filtering (CF). In this paper, we analyze the data approximation in FA as well as CF applications and construct interval-valued matrices to capture these approximation phenomenons. We adapt basic NMF and PMF models to the interval-valued matrices and propose Interval-valued NMF (I-NMF) as well as Interval-valued PMF (I-PMF). We conduct extensive experiments to show that proposed I-NMF and I-PMF significantly outperform their single-valued counterparts in FA and CF applications.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126441380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Discovering Correlated Subspace Clusters in 3D Continuous-Valued Data 三维连续值数据中相关子空间簇的发现
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.19
Kelvin Sim, Z. Aung, V. Gopalkrishnan
{"title":"Discovering Correlated Subspace Clusters in 3D Continuous-Valued Data","authors":"Kelvin Sim, Z. Aung, V. Gopalkrishnan","doi":"10.1109/ICDM.2010.19","DOIUrl":"https://doi.org/10.1109/ICDM.2010.19","url":null,"abstract":"Subspace clusters represent useful information in high-dimensional data. However, mining significant subspace clusters in continuous-valued 3D data such as stock-financial ratio-year data, or gene-sample-time data, is difficult. Firstly, typical metrics either find subspaces with very few objects, or they find too many insignificant subspaces – those which exist by chance. Besides, typical 3D subspace clustering approaches abound with parameters, which are usually set under biased assumptions, making the mining process a ‘guessing game’. We address these concerns by proposing an information theoretic measure, which allows us to identify 3D subspace clusters that stand out from the data. We also develop a highly effective, efficient and parameter-robust algorithm, which is a hybrid of information theoretical and statistical techniques, to mine these clusters. From extensive experimentations, we show that our approach can discover significant 3D subspace clusters embedded in 110 synthetic datasets of varying conditions. We also perform a case study on real-world stock datasets, which shows that our clusters can generate higher profits compared to those mined by other approaches.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125790937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Modeling Experts and Novices in Citizen Science Data for Species Distribution Modeling 物种分布建模的公民科学数据建模专家和新手
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.103
Jun Yu, Weng-Keen Wong, R. Hutchinson
{"title":"Modeling Experts and Novices in Citizen Science Data for Species Distribution Modeling","authors":"Jun Yu, Weng-Keen Wong, R. Hutchinson","doi":"10.1109/ICDM.2010.103","DOIUrl":"https://doi.org/10.1109/ICDM.2010.103","url":null,"abstract":"Citizen scientists, who are volunteers from the community that participate as field assistants in scientific studies [3], enable research to be performed at much larger spatial and temporal scales than trained scientists can cover. Species distribution modeling [6], which involves understanding species-habitat relationships, is a research area that can benefit greatly from citizen science. The eBird project [16] is one of the largest citizen science programs in existence. By allowing birders to upload observations of bird species to an online database, eBird can provide useful data for species distribution modeling. However, since birders vary in their levels of expertise, the quality of data submitted to eBird is often questioned. In this paper, we develop a probabilistic model called the Occupancy-Detection-Expertise (ODE) model that incorporates the expertise of birders submitting data to eBird. We show that modeling the expertise of birders can improve the accuracy of predicting observations of a bird species at a site. In addition, we can use the ODE model for two other tasks: predicting birder expertise given their history of eBird checklists and identifying bird species that are difficult for novices to detect.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134027538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
Testing the Significance of Patterns in Data with Cluster Structure 用聚类结构检验数据中模式的显著性
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.61
Niko Vuokko, P. Kaski
{"title":"Testing the Significance of Patterns in Data with Cluster Structure","authors":"Niko Vuokko, P. Kaski","doi":"10.1109/ICDM.2010.61","DOIUrl":"https://doi.org/10.1109/ICDM.2010.61","url":null,"abstract":"Clustering is one of the basic operations in data analysis, and the cluster structure of a dataset often has a marked effect on observed patterns in data. Testing whether a data mining result is implied by the cluster structure can give substantial information on the formation of the dataset. We propose a new method for empirically testing the statistical significance of patterns in real-valued data in relation to the cluster structure. The method relies on principal component analysis and is based on the general idea of decomposing the data for the purpose of isolating the null model. We evaluate the performance of the method and the information it provides on various real datasets. Our results show that the proposed method is robust and provides nontrivial information about the origin of patterns in data, such as the source of classification accuracy and the observed correlations between attributes.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132389932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Generalized Linear Threshold Model for Multiple Cascades 多级联的广义线性阈值模型
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.153
Nishith Pathak, A. Banerjee, J. Srivastava
{"title":"A Generalized Linear Threshold Model for Multiple Cascades","authors":"Nishith Pathak, A. Banerjee, J. Srivastava","doi":"10.1109/ICDM.2010.153","DOIUrl":"https://doi.org/10.1109/ICDM.2010.153","url":null,"abstract":"This paper presents a generalized version of the linear threshold model for simulating multiple cascades on a network while allowing nodes to switch between them. The proposed model is shown to be a rapidly mixing Markov chain and the corresponding steady state distribution is used to estimate highly likely states of the cascades' spread in the network. Results on a variety of real world networks demonstrate the high quality of the estimated solution.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131062020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 123
A Binary Decision Diagram-Based One-Class Classifier 基于二元决策图的单类分类器
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.84
Takuro Kutsuna
{"title":"A Binary Decision Diagram-Based One-Class Classifier","authors":"Takuro Kutsuna","doi":"10.1109/ICDM.2010.84","DOIUrl":"https://doi.org/10.1109/ICDM.2010.84","url":null,"abstract":"We propose a novel approach for one-class classification problems where a logical formula is used to estimate the region that covers all examples. A formula is viewed as a model that represents a region and is approximated with respect to its hierarchical local densities. The approximation is done quite efficiently via direct manipulations of a binary decision diagram that is a compressed representation of a Boolean formula. The proposed method has only one parameter to be tuned, and the parameter can be selected properly with the help of the minimum description length principle, which requires no labeled training data. In other words, a one-class classifier is generated from an unlabeled training data thoroughly and automatically. Experimental results show that the proposed method works quite well with synthetic data and some realistic data.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116013428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Understanding of Internal Clustering Validation Measures 内部聚类验证措施的理解
2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI: 10.1109/ICDM.2010.35
Yanchi Liu, Zhongmou Li, Hui Xiong, Xuedong Gao, Junjie Wu
{"title":"Understanding of Internal Clustering Validation Measures","authors":"Yanchi Liu, Zhongmou Li, Hui Xiong, Xuedong Gao, Junjie Wu","doi":"10.1109/ICDM.2010.35","DOIUrl":"https://doi.org/10.1109/ICDM.2010.35","url":null,"abstract":"Clustering validation has long been recognized as one of the vital issues essential to the success of clustering applications. In general, clustering validation can be categorized into two classes, external clustering validation and internal clustering validation. In this paper, we focus on internal clustering validation and present a detailed study of 11 widely used internal clustering validation measures for crisp clustering. From five conventional aspects of clustering, we investigate their validation properties. Experiment results show that S_Dbw is the only internal validation measure which performs well in all five aspects, while other measures have certain limitations in different application scenarios.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"237 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116030481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 851
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信