Seventh IEEE International Conference on Data Mining (ICDM 2007)最新文献_第9页

Community Learning by Graph Approximation 基于图逼近的社区学习

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.42

Bo Long, Xiaoyun Xu, Zhongfei Zhang, Philip S. Yu

引用次数: 43

Solving Consensus and Semi-supervised Clustering Problems Using Nonnegative Matrix Factorization 用非负矩阵分解求解一致性和半监督聚类问题

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.98

Tao Li, C. Ding, Michael I. Jordan

引用次数: 233

Discovering Temporal Communities from Social Network Documents 从社交网络文档中发现时间社区

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.56

Ding Zhou, Isaac G. Councill, H. Zha, C. Lee Giles

引用次数: 78

Extracting Product Comparisons from Discussion Boards 从讨论区提取产品比较

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.27

Ronen Feldman, Moshe Fresko, J. Goldenberg, O. Netzer, L. Ungar

引用次数: 76

gApprox: Mining Frequent Approximate Patterns from a Massive Network gApprox:从大规模网络中挖掘频繁的近似模式

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-01 DOI: 10.1109/ICDM.2007.36

Cheng Chen, Xifeng Yan, Feida Zhu, Jiawei Han

{"title":"gApprox: Mining Frequent Approximate Patterns from a Massive Network","authors":"Cheng Chen, Xifeng Yan, Feida Zhu, Jiawei Han","doi":"10.1109/ICDM.2007.36","DOIUrl":"https://doi.org/10.1109/ICDM.2007.36","url":null,"abstract":"Recently, there arise a large number of graphs with massive sizes and complex structures in many new applications, such as biological networks, social networks, and the Web, demanding powerful data mining methods. Due to inherent noise or data diversity, it is crucial to address the issue of approximation, if one wants to mine patterns that are potentially interesting with tolerable variations. In this paper, we investigate the problem of mining frequent approximate patterns from a massive network and propose a method called gApprox. gApprox not only finds approximate network patterns, which is the key for many knowledge discovery applications on structural data, but also enriches the library of graph mining methodologies by introducing several novel techniques such as: (1) a complete and redundancy-free strategy to explore the new pattern space faced by gApprox; and (2) transform \"frequent in an approximate sense\" into an anti-monotonic constraint so that it can be pushed deep into the mining process. Systematic empirical studies on both real and synthetic data sets show that frequent approximate patterns mined from the worm protein-protein interaction network are biologically interesting and gApprox is both effective and efficient.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"165 2-3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123504367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 95

Clustering Needles in a Haystack: An Information Theoretic Analysis of Minority and Outlier Detection 干草堆中的聚类针:少数和离群值检测的信息论分析

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-01 DOI: 10.1109/ICDM.2007.53

S. Ando

引用次数: 52

Improving Text Classification by Using Encyclopedia Knowledge 利用百科知识改进文本分类

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-01 DOI: 10.1109/ICDM.2007.77

Pu Wang, Jian Hu, Hua-Jun Zeng, Lijun Chen, Zheng Chen

引用次数: 87

Sample Selection for Maximal Diversity 最大多样性的样本选择

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-01 DOI: 10.1109/ICDM.2007.16

Feng Pan, Adam Roberts, L. McMillan, D. Threadgill, Wei Wang

{"title":"Sample Selection for Maximal Diversity","authors":"Feng Pan, Adam Roberts, L. McMillan, D. Threadgill, Wei Wang","doi":"10.1109/ICDM.2007.16","DOIUrl":"https://doi.org/10.1109/ICDM.2007.16","url":null,"abstract":"The problem of selecting a sample subset sufficient to preserve diversity arises in many applications. One example is in the design of recombinant inbred lines (RIL) for genetic association studies. In this context, genetic diversity is measured by how many alleles are retained in the resulting inbred strains. RIL panels that are derived from more than two parental strains, such as the collaborative cross (Churchill et al., 2004), present a particular challenge with regard to which of the many existing lab mouse strains should be included in the initial breeding funnel in order to maximize allele retention. A similar problem occurs in the study of customer reviews when selecting a subset of products with a maximal diversity in reviews. Diversity in this case implies the presence of a set of products having both positive and negative ranks for each customer. In this paper, we demonstrate that selecting an optimal diversity subset is an NP-complete problem via reduction to set cover. This reduction is sufficiently tight that greedy approximations to the set cover problem directly apply to maximizing diversity. We then suggest a slightly modified subset selection problem in which an initial greedy diversity solution is used to effectively prune an exhaustive search for all diversity subsets bounded from below by a specified coverage threshold. Extensive experiments on real datasets are performed to demonstrate the effectiveness and efficiency of our approach.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126332613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

DUSC: Dimensionality Unbiased Subspace Clustering 多维无偏子空间聚类

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-01 DOI: 10.1109/ICDM.2007.49

I. Assent, Ralph Krieger, Emmanuel Müller, T. Seidl

引用次数: 142

Improving Knowledge Discovery in Document Collections through Combining Text Retrieval and Link Analysis Techniques 结合文本检索和链接分析技术改进文献馆藏的知识发现

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-01 DOI: 10.1109/ICDM.2007.62

Wei Jin, R. Srihari, H. H. Ho, Xin Wu

引用次数: 56