2009 Ninth IEEE International Conference on Data Mining最新文献_第2页

Redistricting Using Heuristic-Based Polygonal Clustering 基于启发式的多边形聚类重划

2009 Ninth IEEE International Conference on Data Mining Pub Date : 2009-12-06 DOI: 10.1109/ICDM.2009.126

Deepti Joshi, Leen-Kiat Soh, A. Samal

{"title":"Redistricting Using Heuristic-Based Polygonal Clustering","authors":"Deepti Joshi, Leen-Kiat Soh, A. Samal","doi":"10.1109/ICDM.2009.126","DOIUrl":"https://doi.org/10.1109/ICDM.2009.126","url":null,"abstract":"Redistricting is the process of dividing a geographic area into districts or zones. This process has been considered in the past as a problem that is computationally too complex for an automated system to be developed that can produce unbiased plans. In this paper we present a novel method for redistricting a geographic area using a heuristic-based approach for polygonal spatial clustering. While clustering geospatial polygons several complex issues need to be addressed – such as: removing order dependency, clustering all polygons assuming no outliers, and strategically utilizing domain knowledge to guide the clustering process. In order to address these special needs, we have developed the Constrained Polygonal Spatial Clustering (CPSC) algorithm that holistically integrates do-main knowledge in the form of cluster-level and instance-level constraints and uses heuristic functions to grow clusters. In order to illustrate the usefulness of our algorithm we have applied it to the problem of formation of unbiased congressional districts. Furthermore, we compare and contrast our algorithm with two other approaches proposed in the literature for redistricting, namely – graph partitioning and simulated annealing.","PeriodicalId":247645,"journal":{"name":"2009 Ninth IEEE International Conference on Data Mining","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125646646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations PEGASUS:一个peta级图挖掘系统的实现和观察

2009 Ninth IEEE International Conference on Data Mining Pub Date : 2009-12-06 DOI: 10.1109/ICDM.2009.14

U. Kang, Charalampos E. Tsourakakis, C. Faloutsos

{"title":"PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations","authors":"U. Kang, Charalampos E. Tsourakakis, C. Faloutsos","doi":"10.1109/ICDM.2009.14","DOIUrl":"https://doi.org/10.1109/ICDM.2009.14","url":null,"abstract":"In this paper, we describe PEGASUS, an open source Peta Graph Mining library which performs typical graph mining tasks such as computing the diameter of the graph, computing the radius of each node and finding the connected components. As the size of graphs reaches several Giga-, Tera- or Peta-bytes, the necessity for such a library grows too. To the best of our knowledge, PEGASUS is the first such library, implemented on the top of the Hadoop platform, the open source version of MapReduce. Many graph mining operations (PageRank, spectral clustering, diameter estimation, connected components etc.) are essentially a repeated matrix-vector multiplication. In this paper we describe a very important primitive for PEGASUS, called GIM-V (Generalized Iterated Matrix-Vector multiplication). GIM-V is highly optimized, achieving (a) good scale-up on the number of available machines (b) linear running time on the number of edges, and (c) more than 5 times faster performance over the non-optimized version of GIM-V. Our experiments ran on M45, one of the top 50 supercomputers in the world. We report our findings on several real graphs, including one of the largest publicly available Web Graphs, thanks to Yahoo!, with 6,7 billion edges.","PeriodicalId":247645,"journal":{"name":"2009 Ninth IEEE International Conference on Data Mining","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114347627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 735

TrBagg: A Simple Transfer Learning Method and its Application to Personalization in Collaborative Tagging TrBagg:一种简单的迁移学习方法及其在协作标注个性化中的应用

2009 Ninth IEEE International Conference on Data Mining Pub Date : 2009-12-06 DOI: 10.1109/ICDM.2009.9

Toshihiro Kamishima, Masahiro Hamasaki, S. Akaho

引用次数: 83

Uncoverning Groups via Heterogeneous Interaction Analysis 通过异质相互作用分析揭示组

2009 Ninth IEEE International Conference on Data Mining Pub Date : 2009-12-06 DOI: 10.1109/ICDM.2009.20

Lei Tang, Xufei Wang, Huan Liu

{"title":"Uncoverning Groups via Heterogeneous Interaction Analysis","authors":"Lei Tang, Xufei Wang, Huan Liu","doi":"10.1109/ICDM.2009.20","DOIUrl":"https://doi.org/10.1109/ICDM.2009.20","url":null,"abstract":"With the pervasive availability of Web 2.0 and social networking sites, people can interact with each other easily through various social media. For instance, popular sites like Del.icio.us, Flickr, and YouTube allow users to comment shared content (bookmark, photos, videos), and users can tag their own favorite content. Users can also connect to each other, and subscribe to or become a fan or a follower of others. These diverse individual activities result in a multi-dimensional network among actors, forming cross-dimension group structures with group members sharing certain similarities. It is challenging to effectively integrate the network information of multiple dimensions in order to discover cross-dimension group structures. In this work, we propose a two-phase strategy to identify the hidden structures shared across dimensions in multi-dimensional networks. We extract structural features from each dimension of the network via modularity analysis, and then integrate them all to find out a robust community structure among actors. Experiments on synthetic and real-world data validate the superiority of our strategy, enabling the analysis of collective behavior underneath diverse individual activities in a large scale.","PeriodicalId":247645,"journal":{"name":"2009 Ninth IEEE International Conference on Data Mining","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115042838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 184

A Deep Non-linear Feature Mapping for Large-Margin kNN Classification 基于深度非线性特征映射的大边界kNN分类

2009 Ninth IEEE International Conference on Data Mining Pub Date : 2009-12-06 DOI: 10.1109/ICDM.2009.27

Martin Renqiang Min, D. A. Stanley, Zineng Yuan, A. Bonner, Zhaolei Zhang

引用次数: 92

An Effective Approach to Inverse Frequent Set Mining 一种有效的反频繁集挖掘方法

2009 Ninth IEEE International Conference on Data Mining Pub Date : 2009-12-06 DOI: 10.1109/ICDM.2009.123

A. Guzzo, D. Saccá, Edoardo Serra

引用次数: 22

A New MCA-Based Divisive Hierarchical Algorithm for Clustering Categorical Data 一种基于mca的分类数据聚类新算法

2009 Ninth IEEE International Conference on Data Mining Pub Date : 2009-12-06 DOI: 10.1109/ICDM.2009.118

Tengke Xiong, Shengrui Wang, A. Mayers, E. Monga

{"title":"A New MCA-Based Divisive Hierarchical Algorithm for Clustering Categorical Data","authors":"Tengke Xiong, Shengrui Wang, A. Mayers, E. Monga","doi":"10.1109/ICDM.2009.118","DOIUrl":"https://doi.org/10.1109/ICDM.2009.118","url":null,"abstract":"Clustering categorical data faces two challenges, one is lacking of inherent similarity measure, and the other is that the clusters are prone to being embedded in different subspace. In this paper, we propose the first divisive hierarchical clustering algorithm for categorical data. The algorithm, which is based on Multiple Correspondence Analysis (MCA), is systematic, efficient and effective. In our algorithm, MCA plays an important role in analyzing the data globally. The proposed algorithm has five merits. First, our algorithm yields a dendrogram representing nested groupings of patterns and similarity levels at different granularities. Second, it is parameter-free, fully automatic and, most importantly, requires no assumption regarding the number of clusters. Third, it is independent of the order in which the data are processed. Forth, it is scalable to large data sets; and finally, using the novel data representation and Chi-square distance measures makes our algorithm capable of seamlessly discovering the clusters embedded in the subspaces. Experiments on both synthetic and real data demonstrate the superior performance of our algorithm.","PeriodicalId":247645,"journal":{"name":"2009 Ninth IEEE International Conference on Data Mining","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130779360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

To Trust or Not to Trust? Predicting Online Trusts Using Trust Antecedent Framework 相信还是不相信?基于信任先行框架的在线信任预测

2009 Ninth IEEE International Conference on Data Mining Pub Date : 2009-12-06 DOI: 10.1109/ICDM.2009.115

Viet-An Nguyen, Ee-Peng Lim, Jing Jiang, Aixin Sun

引用次数: 59

SLIDER: Mining Correlated Motifs in Protein-Protein Interaction Networks 滑块:在蛋白质-蛋白质相互作用网络中挖掘相关基序

2009 Ninth IEEE International Conference on Data Mining Pub Date : 2009-12-06 DOI: 10.1109/ICDM.2009.92

Peter Boyen, F. Neven, D. V. Dyck, A. V. Dijk, R. V. Ham

引用次数: 5

Inverse Time Dependency in Convex Regularized Learning 凸正则化学习中的逆时间依赖

2009 Ninth IEEE International Conference on Data Mining Pub Date : 2009-12-06 DOI: 10.1109/ICDM.2009.28

Z. Zhu, Weizhu Chen, Chenguang Zhu, Gang Wang, Haixun Wang, Zheng Chen

引用次数: 5