2009 Ninth IEEE International Conference on Data Mining最新文献

筛选
英文 中文
Redistricting Using Heuristic-Based Polygonal Clustering 基于启发式的多边形聚类重划
2009 Ninth IEEE International Conference on Data Mining Pub Date : 2009-12-06 DOI: 10.1109/ICDM.2009.126
Deepti Joshi, Leen-Kiat Soh, A. Samal
{"title":"Redistricting Using Heuristic-Based Polygonal Clustering","authors":"Deepti Joshi, Leen-Kiat Soh, A. Samal","doi":"10.1109/ICDM.2009.126","DOIUrl":"https://doi.org/10.1109/ICDM.2009.126","url":null,"abstract":"Redistricting is the process of dividing a geographic area into districts or zones. This process has been considered in the past as a problem that is computationally too complex for an automated system to be developed that can produce unbiased plans. In this paper we present a novel method for redistricting a geographic area using a heuristic-based approach for polygonal spatial clustering. While clustering geospatial polygons several complex issues need to be addressed – such as: removing order dependency, clustering all polygons assuming no outliers, and strategically utilizing domain knowledge to guide the clustering process. In order to address these special needs, we have developed the Constrained Polygonal Spatial Clustering (CPSC) algorithm that holistically integrates do-main knowledge in the form of cluster-level and instance-level constraints and uses heuristic functions to grow clusters. In order to illustrate the usefulness of our algorithm we have applied it to the problem of formation of unbiased congressional districts. Furthermore, we compare and contrast our algorithm with two other approaches proposed in the literature for redistricting, namely – graph partitioning and simulated annealing.","PeriodicalId":247645,"journal":{"name":"2009 Ninth IEEE International Conference on Data Mining","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125646646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations PEGASUS:一个peta级图挖掘系统的实现和观察
2009 Ninth IEEE International Conference on Data Mining Pub Date : 2009-12-06 DOI: 10.1109/ICDM.2009.14
U. Kang, Charalampos E. Tsourakakis, C. Faloutsos
{"title":"PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations","authors":"U. Kang, Charalampos E. Tsourakakis, C. Faloutsos","doi":"10.1109/ICDM.2009.14","DOIUrl":"https://doi.org/10.1109/ICDM.2009.14","url":null,"abstract":"In this paper, we describe PEGASUS, an open source Peta Graph Mining library which performs typical graph mining tasks such as computing the diameter of the graph, computing the radius of each node and finding the connected components. As the size of graphs reaches several Giga-, Tera- or Peta-bytes, the necessity for such a library grows too. To the best of our knowledge, PEGASUS is the first such library, implemented on the top of the Hadoop platform, the open source version of MapReduce. Many graph mining operations (PageRank, spectral clustering, diameter estimation, connected components etc.) are essentially a repeated matrix-vector multiplication. In this paper we describe a very important primitive for PEGASUS, called GIM-V (Generalized Iterated Matrix-Vector multiplication). GIM-V is highly optimized, achieving (a) good scale-up on the number of available machines (b) linear running time on the number of edges, and (c) more than 5 times faster performance over the non-optimized version of GIM-V. Our experiments ran on M45, one of the top 50 supercomputers in the world. We report our findings on several real graphs, including one of the largest publicly available Web Graphs, thanks to Yahoo!, with 6,7 billion edges.","PeriodicalId":247645,"journal":{"name":"2009 Ninth IEEE International Conference on Data Mining","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114347627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 735
TrBagg: A Simple Transfer Learning Method and its Application to Personalization in Collaborative Tagging TrBagg:一种简单的迁移学习方法及其在协作标注个性化中的应用
2009 Ninth IEEE International Conference on Data Mining Pub Date : 2009-12-06 DOI: 10.1109/ICDM.2009.9
Toshihiro Kamishima, Masahiro Hamasaki, S. Akaho
{"title":"TrBagg: A Simple Transfer Learning Method and its Application to Personalization in Collaborative Tagging","authors":"Toshihiro Kamishima, Masahiro Hamasaki, S. Akaho","doi":"10.1109/ICDM.2009.9","DOIUrl":"https://doi.org/10.1109/ICDM.2009.9","url":null,"abstract":"The aim of transfer learning is to improve prediction accuracy on a target task by exploiting the training examples for tasks that are related to the target one. Transfer learning has received more attention in recent years, because this technique is considered to be helpful in reducing the cost of labeling. In this paper, we propose a very simple approach to transfer learning: TrBagg, which is the extension of bagging. TrBagg is composed of two stages: Many weak classifiers are first generated as in standard bagging, and these classifiers are then filtered based on their usefulness for the target task. This simplicity makes it easy to work reasonably well without severe tuning of learning parameters. Further, our algorithm equips an algorithmic scheme to avoid negative transfer. We applied TrBagg to personalized tag prediction tasks for social bookmarks Our approach has several convenient characteristics for this task such as adaptation to multiple tasks with low computational cost.","PeriodicalId":247645,"journal":{"name":"2009 Ninth IEEE International Conference on Data Mining","volume":"176 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114573427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 83
Uncoverning Groups via Heterogeneous Interaction Analysis 通过异质相互作用分析揭示组
2009 Ninth IEEE International Conference on Data Mining Pub Date : 2009-12-06 DOI: 10.1109/ICDM.2009.20
Lei Tang, Xufei Wang, Huan Liu
{"title":"Uncoverning Groups via Heterogeneous Interaction Analysis","authors":"Lei Tang, Xufei Wang, Huan Liu","doi":"10.1109/ICDM.2009.20","DOIUrl":"https://doi.org/10.1109/ICDM.2009.20","url":null,"abstract":"With the pervasive availability of Web 2.0 and social networking sites, people can interact with each other easily through various social media. For instance, popular sites like Del.icio.us, Flickr, and YouTube allow users to comment shared content (bookmark, photos, videos), and users can tag their own favorite content. Users can also connect to each other, and subscribe to or become a fan or a follower of others. These diverse individual activities result in a multi-dimensional network among actors, forming cross-dimension group structures with group members sharing certain similarities. It is challenging to effectively integrate the network information of multiple dimensions in order to discover cross-dimension group structures. In this work, we propose a two-phase strategy to identify the hidden structures shared across dimensions in multi-dimensional networks. We extract structural features from each dimension of the network via modularity analysis, and then integrate them all to find out a robust community structure among actors. Experiments on synthetic and real-world data validate the superiority of our strategy, enabling the analysis of collective behavior underneath diverse individual activities in a large scale.","PeriodicalId":247645,"journal":{"name":"2009 Ninth IEEE International Conference on Data Mining","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115042838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 184
A Deep Non-linear Feature Mapping for Large-Margin kNN Classification 基于深度非线性特征映射的大边界kNN分类
2009 Ninth IEEE International Conference on Data Mining Pub Date : 2009-12-06 DOI: 10.1109/ICDM.2009.27
Martin Renqiang Min, D. A. Stanley, Zineng Yuan, A. Bonner, Zhaolei Zhang
{"title":"A Deep Non-linear Feature Mapping for Large-Margin kNN Classification","authors":"Martin Renqiang Min, D. A. Stanley, Zineng Yuan, A. Bonner, Zhaolei Zhang","doi":"10.1109/ICDM.2009.27","DOIUrl":"https://doi.org/10.1109/ICDM.2009.27","url":null,"abstract":"KNN is one of the most popular data mining methods for classification, but it often fails to work well with inappropriate choice of distance metric or due to the presence of numerous class-irrelevant features. Linear feature transformation methods have been widely applied to extract class-relevant information to improve kNN classification, which is very limited in many applications. Kernels have also been used to learn powerful non-linear feature transformations, but these methods fail to scale to large datasets. In this paper, we present a scalable non-linear feature mapping method based on a deep neural network pretrained with Restricted Boltzmann Machines for improving kNN classification in a large-margin framework, which we call DNet-kNN. DNet-kNN can be used for both classification and for supervised dimensionality reduction. The experimental results on two benchmark handwritten digit datasets and one newsgroup text dataset show that DNet-kNN has much better performance than large-margin kNN using a linear mapping and kNN based on a deep autoencoder pretrained with Restricted Boltzmann Machines.","PeriodicalId":247645,"journal":{"name":"2009 Ninth IEEE International Conference on Data Mining","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131075857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 92
An Effective Approach to Inverse Frequent Set Mining 一种有效的反频繁集挖掘方法
2009 Ninth IEEE International Conference on Data Mining Pub Date : 2009-12-06 DOI: 10.1109/ICDM.2009.123
A. Guzzo, D. Saccá, Edoardo Serra
{"title":"An Effective Approach to Inverse Frequent Set Mining","authors":"A. Guzzo, D. Saccá, Edoardo Serra","doi":"10.1109/ICDM.2009.123","DOIUrl":"https://doi.org/10.1109/ICDM.2009.123","url":null,"abstract":"The inverse frequent set mining problem is the problem of computing a database on which a given collection of itemsets must emerge to be frequent. Earlier studies focused on investigating computational and approximability properties of this problem. In this paper, we face it under the pragmatic perspective of defining heuristic solution approaches that are effective and scalable in real scenarios. In particular, a general formulation of the problem is considered where minimum and maximum support constraints can be defined on each itemset, and where no bound is given beforehand on the size of the resulting output database. Within this setting, an algorithm is proposed that always satisfies the maximum support constraints, but which treats minimum support constraints as soft ones that are enforced as long as possible. A thorough experimentation evidences that minimum support constraints are hardly violated in practice, and that such negligible degradation in accuracy (which is unavoidable due to the theoretical intractability of the problem) is compensated by very good scaling performances.","PeriodicalId":247645,"journal":{"name":"2009 Ninth IEEE International Conference on Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130568855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
A New MCA-Based Divisive Hierarchical Algorithm for Clustering Categorical Data 一种基于mca的分类数据聚类新算法
2009 Ninth IEEE International Conference on Data Mining Pub Date : 2009-12-06 DOI: 10.1109/ICDM.2009.118
Tengke Xiong, Shengrui Wang, A. Mayers, E. Monga
{"title":"A New MCA-Based Divisive Hierarchical Algorithm for Clustering Categorical Data","authors":"Tengke Xiong, Shengrui Wang, A. Mayers, E. Monga","doi":"10.1109/ICDM.2009.118","DOIUrl":"https://doi.org/10.1109/ICDM.2009.118","url":null,"abstract":"Clustering categorical data faces two challenges, one is lacking of inherent similarity measure, and the other is that the clusters are prone to being embedded in different subspace. In this paper, we propose the first divisive hierarchical clustering algorithm for categorical data. The algorithm, which is based on Multiple Correspondence Analysis (MCA), is systematic, efficient and effective. In our algorithm, MCA plays an important role in analyzing the data globally. The proposed algorithm has five merits. First, our algorithm yields a dendrogram representing nested groupings of patterns and similarity levels at different granularities. Second, it is parameter-free, fully automatic and, most importantly, requires no assumption regarding the number of clusters. Third, it is independent of the order in which the data are processed. Forth, it is scalable to large data sets; and finally, using the novel data representation and Chi-square distance measures makes our algorithm capable of seamlessly discovering the clusters embedded in the subspaces. Experiments on both synthetic and real data demonstrate the superior performance of our algorithm.","PeriodicalId":247645,"journal":{"name":"2009 Ninth IEEE International Conference on Data Mining","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130779360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
To Trust or Not to Trust? Predicting Online Trusts Using Trust Antecedent Framework 相信还是不相信?基于信任先行框架的在线信任预测
2009 Ninth IEEE International Conference on Data Mining Pub Date : 2009-12-06 DOI: 10.1109/ICDM.2009.115
Viet-An Nguyen, Ee-Peng Lim, Jing Jiang, Aixin Sun
{"title":"To Trust or Not to Trust? Predicting Online Trusts Using Trust Antecedent Framework","authors":"Viet-An Nguyen, Ee-Peng Lim, Jing Jiang, Aixin Sun","doi":"10.1109/ICDM.2009.115","DOIUrl":"https://doi.org/10.1109/ICDM.2009.115","url":null,"abstract":"This paper analyzes the trustor and trustee factors that lead to inter-personal trust using a well studied Trust Antecedent framework in management science cite{mayer}. To apply these factors to trust ranking problem in online rating systems, we derive features that correspond to each factor and develop different trust ranking models. The advantage of this approach is that features relevant to trust can be systematically derived so as to achieve good prediction accuracy. Through a series of experiments on real data from Epinions, we show that even a simple model using the derived features yields good accuracy and outperforms MoleTrust, a trust propagation based model. SVM classifiers using these features also show improvements.","PeriodicalId":247645,"journal":{"name":"2009 Ninth IEEE International Conference on Data Mining","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132445115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 59
SLIDER: Mining Correlated Motifs in Protein-Protein Interaction Networks 滑块:在蛋白质-蛋白质相互作用网络中挖掘相关基序
2009 Ninth IEEE International Conference on Data Mining Pub Date : 2009-12-06 DOI: 10.1109/ICDM.2009.92
Peter Boyen, F. Neven, D. V. Dyck, A. V. Dijk, R. V. Ham
{"title":"SLIDER: Mining Correlated Motifs in Protein-Protein Interaction Networks","authors":"Peter Boyen, F. Neven, D. V. Dyck, A. V. Dijk, R. V. Ham","doi":"10.1109/ICDM.2009.92","DOIUrl":"https://doi.org/10.1109/ICDM.2009.92","url":null,"abstract":"Correlated motif mining (CMM) is the problem to find overrepresented pairs of patterns, called motif pairs, in interacting protein sequences. Algorithmic solutions for CMM thereby provide a computational method for predicting binding sites for protein interaction. In this paper, we adopt a motif-driven approach where the support of candidate motif pairs is evaluated in the network. We experimentally establish the superiority of the Chi-square-based support measure over other support measures. Furthermore, we obtain that CMM is an NP-hard problem for a large class of support measures (including Chi-square) and reformulate the search for correlated motifs as a combinatorial optimization problem. We then present the method SLIDER which uses local search with a neighborhood function based on sliding motifs and employs the Chi-square-based support measure. We show that SLIDER outperforms existing motif-driven CMM methods and scales to large protein-protein interaction networks.","PeriodicalId":247645,"journal":{"name":"2009 Ninth IEEE International Conference on Data Mining","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125523949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Inverse Time Dependency in Convex Regularized Learning 凸正则化学习中的逆时间依赖
2009 Ninth IEEE International Conference on Data Mining Pub Date : 2009-12-06 DOI: 10.1109/ICDM.2009.28
Z. Zhu, Weizhu Chen, Chenguang Zhu, Gang Wang, Haixun Wang, Zheng Chen
{"title":"Inverse Time Dependency in Convex Regularized Learning","authors":"Z. Zhu, Weizhu Chen, Chenguang Zhu, Gang Wang, Haixun Wang, Zheng Chen","doi":"10.1109/ICDM.2009.28","DOIUrl":"https://doi.org/10.1109/ICDM.2009.28","url":null,"abstract":"In the conventional regularized learning, training time increases as the training set expands. Recent work on L2 linear SVM challenges this common sense by proposing the inverse time dependency on the training set size. In this paper, we first put forward a Primal Gradient Solver (PGS) to effectively solve the convex regularized learning problem. This solver is based on the stochastic gradient descent method and the Fenchel conjugate adjustment, employing the well-known online strongly convex optimization algorithm with logarithmic regret. We then theoretically prove the inverse dependency property of our PGS, embracing the previous work of the L2 linear SVM as a special case and enable the l_p-norm optimization to run within a bounded sphere, which qualifies more convex loss functions in PGS. We further illustrate this solver in three examples: SVM, logistic regression and regularized least square. Experimental results substantiate the property of the inverse dependency on training data size.","PeriodicalId":247645,"journal":{"name":"2009 Ninth IEEE International Conference on Data Mining","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126445681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书