Seventh IEEE International Conference on Data Mining (ICDM 2007)最新文献

筛选
英文 中文
Community Learning by Graph Approximation 基于图逼近的社区学习
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.42
Bo Long, Xiaoyun Xu, Zhongfei Zhang, Philip S. Yu
{"title":"Community Learning by Graph Approximation","authors":"Bo Long, Xiaoyun Xu, Zhongfei Zhang, Philip S. Yu","doi":"10.1109/ICDM.2007.42","DOIUrl":"https://doi.org/10.1109/ICDM.2007.42","url":null,"abstract":"Learning communities from a graph is an important problem in many domains. Different types of communities can be generalized as link-pattern based communities. In this paper, we propose a general model based on graph approximation to learn link-pattern based community structures from a graph. The model generalizes the traditional graph partitioning approaches and is applicable to learning various community structures. Under this model, we derive a family of algorithms which are flexible to learn various community structures and easy to incorporate the prior knowledge of the community structures. Experimental evaluation and theoretical analysis show the effectiveness and great potential of the proposed model and algorithms.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133515721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Solving Consensus and Semi-supervised Clustering Problems Using Nonnegative Matrix Factorization 用非负矩阵分解求解一致性和半监督聚类问题
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.98
Tao Li, C. Ding, Michael I. Jordan
{"title":"Solving Consensus and Semi-supervised Clustering Problems Using Nonnegative Matrix Factorization","authors":"Tao Li, C. Ding, Michael I. Jordan","doi":"10.1109/ICDM.2007.98","DOIUrl":"https://doi.org/10.1109/ICDM.2007.98","url":null,"abstract":"Consensus clustering and semi-supervised clustering are important extensions of the standard clustering paradigm. Consensus clustering (also known as aggregation of clustering) can improve clustering robustness, deal with distributed and heterogeneous data sources and make use of multiple clustering criteria. Semi-supervised clustering can integrate various forms of background knowledge into clustering. In this paper, we show how consensus and semi-supervised clustering can be formulated within the framework of nonnegative matrix factorization (NMF). We show that this framework yields NMF-based algorithms that are: (1) extremely simple to implement; (2) provably correct and provably convergent. We conduct a wide range of comparative experiments that demonstrate the effectiveness of this NMF-based approach.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129083987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 233
Discovering Temporal Communities from Social Network Documents 从社交网络文档中发现时间社区
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.56
Ding Zhou, Isaac G. Councill, H. Zha, C. Lee Giles
{"title":"Discovering Temporal Communities from Social Network Documents","authors":"Ding Zhou, Isaac G. Councill, H. Zha, C. Lee Giles","doi":"10.1109/ICDM.2007.56","DOIUrl":"https://doi.org/10.1109/ICDM.2007.56","url":null,"abstract":"This paper studies the discovery of communities from social network documents produced over time, addressing the discovery of temporal trends in community memberships. We first formulate static community discovery at a single time period as a tripartite graph partitioning problem. Then we propose to discover the temporal communities by threading the statically derived communities in different time periods using a new constrained partitioning algorithm, which partitions graphs based on topology as well as prior information regarding vertex membership. We evaluate the proposed approach on synthetic datasets and a real-world dataset prepared from the CiteSeer.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125873259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 78
Extracting Product Comparisons from Discussion Boards 从讨论区提取产品比较
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.27
Ronen Feldman, Moshe Fresko, J. Goldenberg, O. Netzer, L. Ungar
{"title":"Extracting Product Comparisons from Discussion Boards","authors":"Ronen Feldman, Moshe Fresko, J. Goldenberg, O. Netzer, L. Ungar","doi":"10.1109/ICDM.2007.27","DOIUrl":"https://doi.org/10.1109/ICDM.2007.27","url":null,"abstract":"In recent years, product discussion forums have become a rich environment in which consumers and potential adopters exchange views and information. Researchers and practitioners are starting to extract user sentiment about products from user product reviews. Users often compare different products, stating which they like better and why. Extracting information about product comparisons offers a number of challenges; recognizing and normalizing entities (products) in the informal language of blogs and discussion groups require different techniques than those used for entity extraction in the more formal text of newspapers and scientific articles. We present a case study in extracting information about comparisons between running shoes and between cars, describe an effective methodology, and show how it produces insight into how consumers view the running shoe and car markets.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121771608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 76
gApprox: Mining Frequent Approximate Patterns from a Massive Network gApprox:从大规模网络中挖掘频繁的近似模式
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-01 DOI: 10.1109/ICDM.2007.36
Cheng Chen, Xifeng Yan, Feida Zhu, Jiawei Han
{"title":"gApprox: Mining Frequent Approximate Patterns from a Massive Network","authors":"Cheng Chen, Xifeng Yan, Feida Zhu, Jiawei Han","doi":"10.1109/ICDM.2007.36","DOIUrl":"https://doi.org/10.1109/ICDM.2007.36","url":null,"abstract":"Recently, there arise a large number of graphs with massive sizes and complex structures in many new applications, such as biological networks, social networks, and the Web, demanding powerful data mining methods. Due to inherent noise or data diversity, it is crucial to address the issue of approximation, if one wants to mine patterns that are potentially interesting with tolerable variations. In this paper, we investigate the problem of mining frequent approximate patterns from a massive network and propose a method called gApprox. gApprox not only finds approximate network patterns, which is the key for many knowledge discovery applications on structural data, but also enriches the library of graph mining methodologies by introducing several novel techniques such as: (1) a complete and redundancy-free strategy to explore the new pattern space faced by gApprox; and (2) transform \"frequent in an approximate sense\" into an anti-monotonic constraint so that it can be pushed deep into the mining process. Systematic empirical studies on both real and synthetic data sets show that frequent approximate patterns mined from the worm protein-protein interaction network are biologically interesting and gApprox is both effective and efficient.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"165 2-3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123504367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 95
Clustering Needles in a Haystack: An Information Theoretic Analysis of Minority and Outlier Detection 干草堆中的聚类针:少数和离群值检测的信息论分析
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-01 DOI: 10.1109/ICDM.2007.53
S. Ando
{"title":"Clustering Needles in a Haystack: An Information Theoretic Analysis of Minority and Outlier Detection","authors":"S. Ando","doi":"10.1109/ICDM.2007.53","DOIUrl":"https://doi.org/10.1109/ICDM.2007.53","url":null,"abstract":"Identifying atypical objects is one of the traditional topics in machine learning. Recently, novel approaches, e.g., Minority Detection and One-class clustering, have explored further to identify clusters of atypical objects which strongly contrast from the rest of the data in terms of their distribution or density. This paper analyzes such tasks from an information theoretic perspective. Based on Information Bottleneck formalization, these tasks interpret to increasing the averaged atypicalness of the clusters while reducing the complexity of the clustering. This formalization yields a unifying view of the new approaches as well as the classic outlier detection. We also present a scalable minimization algorithm which exploits the localized form of the cost function over individual clusters. The proposed algorithm is evaluated using simulated datasets and a text classification benchmark, in comparison with an existing method.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134419770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Improving Text Classification by Using Encyclopedia Knowledge 利用百科知识改进文本分类
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-01 DOI: 10.1109/ICDM.2007.77
Pu Wang, Jian Hu, Hua-Jun Zeng, Lijun Chen, Zheng Chen
{"title":"Improving Text Classification by Using Encyclopedia Knowledge","authors":"Pu Wang, Jian Hu, Hua-Jun Zeng, Lijun Chen, Zheng Chen","doi":"10.1109/ICDM.2007.77","DOIUrl":"https://doi.org/10.1109/ICDM.2007.77","url":null,"abstract":"The exponential growth of text documents available on the Internet has created an urgent need for accurate, fast, and general purpose text classification algorithms. However, the \"bag of words\" representation used for these classification methods is often unsatisfactory as it ignores relationships between important terms that do not co-occur literally. In order to deal with this problem, we integrate background knowledge - in our application: Wikipedia - into the process of classifying text documents. The experimental evaluation on Reuters newsfeeds and several other corpus shows that our classification results with encyclopedia knowledge are much better than the baseline \"bag of words \" methods.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132184707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 87
Sample Selection for Maximal Diversity 最大多样性的样本选择
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-01 DOI: 10.1109/ICDM.2007.16
Feng Pan, Adam Roberts, L. McMillan, D. Threadgill, Wei Wang
{"title":"Sample Selection for Maximal Diversity","authors":"Feng Pan, Adam Roberts, L. McMillan, D. Threadgill, Wei Wang","doi":"10.1109/ICDM.2007.16","DOIUrl":"https://doi.org/10.1109/ICDM.2007.16","url":null,"abstract":"The problem of selecting a sample subset sufficient to preserve diversity arises in many applications. One example is in the design of recombinant inbred lines (RIL) for genetic association studies. In this context, genetic diversity is measured by how many alleles are retained in the resulting inbred strains. RIL panels that are derived from more than two parental strains, such as the collaborative cross (Churchill et al., 2004), present a particular challenge with regard to which of the many existing lab mouse strains should be included in the initial breeding funnel in order to maximize allele retention. A similar problem occurs in the study of customer reviews when selecting a subset of products with a maximal diversity in reviews. Diversity in this case implies the presence of a set of products having both positive and negative ranks for each customer. In this paper, we demonstrate that selecting an optimal diversity subset is an NP-complete problem via reduction to set cover. This reduction is sufficiently tight that greedy approximations to the set cover problem directly apply to maximizing diversity. We then suggest a slightly modified subset selection problem in which an initial greedy diversity solution is used to effectively prune an exhaustive search for all diversity subsets bounded from below by a specified coverage threshold. Extensive experiments on real datasets are performed to demonstrate the effectiveness and efficiency of our approach.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126332613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
DUSC: Dimensionality Unbiased Subspace Clustering 多维无偏子空间聚类
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-01 DOI: 10.1109/ICDM.2007.49
I. Assent, Ralph Krieger, Emmanuel Müller, T. Seidl
{"title":"DUSC: Dimensionality Unbiased Subspace Clustering","authors":"I. Assent, Ralph Krieger, Emmanuel Müller, T. Seidl","doi":"10.1109/ICDM.2007.49","DOIUrl":"https://doi.org/10.1109/ICDM.2007.49","url":null,"abstract":"To gain insight into today's large data resources, data mining provides automatic aggregation techniques. Clustering aims at grouping data such that objects within groups are similar while objects in different groups are dissimilar. In scenarios with many attributes or with noise, clusters are often hidden in subspaces of the data and do not show up in the full dimensional space. For these applications, subspace clustering methods aim at detecting clusters in any sub- space. Existing subspace clustering approaches fall prey to an effect we call dimensionality bias. As dimensionality of subspaces varies, approaches which do not take this effect into account fail to separate clusters from noise. We give a formal definition of dimensionality bias and analyze consequences for subspace clustering. A dimensionality unbiased subspace clustering (DUSC) definition based on statistical foundations is proposed. In thorough experiments on synthetic and real world data, we show that our approach outperforms existing subspace clustering algorithms.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121625162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 142
Improving Knowledge Discovery in Document Collections through Combining Text Retrieval and Link Analysis Techniques 结合文本检索和链接分析技术改进文献馆藏的知识发现
Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-01 DOI: 10.1109/ICDM.2007.62
Wei Jin, R. Srihari, H. H. Ho, Xin Wu
{"title":"Improving Knowledge Discovery in Document Collections through Combining Text Retrieval and Link Analysis Techniques","authors":"Wei Jin, R. Srihari, H. H. Ho, Xin Wu","doi":"10.1109/ICDM.2007.62","DOIUrl":"https://doi.org/10.1109/ICDM.2007.62","url":null,"abstract":"In this paper, we present Concept Chain Queries (CCQ), a special case of text mining in document collections focusing on detecting links between two topics across text documents. We interpret such a query as finding the most meaningful evidence trails across documents that connect these two topics. We propose to use link-analysis techniques over the extracted features provided by Information Extraction Engine for finding new knowledge. A graphical text representation and mining model is proposed which combines information retrieval, association mining and link analysis techniques. We present experiments on different datasets that demonstrate the effectiveness of our algorithm. Specifically, the algorithm generates ranked concept chains and evidence trails where the key terms representing significant relationships between topics are ranked high.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"290 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123395951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信