Seventh IEEE International Conference on Data Mining (ICDM 2007)最新文献

An Efficient Spectral Algorithm for Network Community Discovery and Its Applications to Biological and Social Networks 网络社区发现的高效谱算法及其在生物和社会网络中的应用

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.72

Jianhua Ruan, Weixiong Zhang

{"title":"An Efficient Spectral Algorithm for Network Community Discovery and Its Applications to Biological and Social Networks","authors":"Jianhua Ruan, Weixiong Zhang","doi":"10.1109/ICDM.2007.72","DOIUrl":"https://doi.org/10.1109/ICDM.2007.72","url":null,"abstract":"Automatic discovery of community structures in complex networks is a fundamental task in many disciplines, including social science, engineering, and biology. A quantitative measure called modularity (Q) has been proposed to effectively assess the quality of community structures. Several community discovery algorithms have since been developed based on the optimization of Q. However, this optimization problem is NP-hard, and the existing algorithms have a low accuracy or are computationally expensive. In this paper, we present an efficient spectral algorithm for modularity optimization. When tested on a large number of synthetic or real-world networks, and compared to the existing algorithms, our method is efficient and and has a high accuracy. In addition, we have successfully applied our algorithm to detect interesting and meaningful community structures from real-world networks in different domains, including biology, medicine and social science. Due to space limitation, results of these applications are presented in a complete version of the paper available on our Website (http://cse .wustl.edu/ ~jruan/).","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122799265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 132

Optimal Subsequence Bijection 最优子序列双射

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.47

Longin Jan Latecki, Qiang Wang, Suzan Köknar-Tezel, V. Megalooikonomou

{"title":"Optimal Subsequence Bijection","authors":"Longin Jan Latecki, Qiang Wang, Suzan Köknar-Tezel, V. Megalooikonomou","doi":"10.1109/ICDM.2007.47","DOIUrl":"https://doi.org/10.1109/ICDM.2007.47","url":null,"abstract":"We consider the problem of elastic matching of sequences of real numbers. Since both a query and a target sequence may be noisy, i.e., contain some outlier elements, it is desirable to exclude the outlier elements from matching in order to obtain a robust matching performance. Moreover, in many applications like shape alignment or stereo correspondence it is also desirable to have a one-to-one and onto correspondence (bijection) between the remaining elements. We propose an algorithm that determines the optimal subsequence bijection (OSB) of a query and target sequence. The OSB is efficiently computed since we map the problem's solution to a cheapest path in a DAG (directed acyclic graph). We obtained excellent results on standard benchmark time series datasets. We compared OSB to Dynamic Time Warping (DTW) with and without warping window. We do not claim that OSB is always superior to DTW. However, our results demonstrate that skipping outlier elements as done by OSB can significantly improve matching results for many real datasets. Moreover, OSB is particularly suitable for partial matching. We applied it to the object recognition problem when only parts of contours are given. We obtained sequences representing shapes by representing object contours as sequences of curvatures.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114291811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 53

Can the Content of Public News Be Used to Forecast Abnormal Stock Market Behaviour? 公共新闻内容能否用于预测股市异常行为?

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.74

Calum S. Robertson, S. Geva, R. Wolff

引用次数: 14

Local Probabilistic Models for Link Prediction 链路预测的局部概率模型

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.108

Chao Wang, Venu Satuluri, S. Parthasarathy

引用次数: 330

Efficient Data Sampling in Heterogeneous Peer-to-Peer Networks 异构点对点网络中的高效数据采样

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.71

Benjamin Arai, Song Lin, D. Gunopulos

引用次数: 10

Social Network Extraction of Academic Researchers 学术研究人员社交网络提取

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.30

Jie Tang, Duo Zhang, Limin Yao

{"title":"Social Network Extraction of Academic Researchers","authors":"Jie Tang, Duo Zhang, Limin Yao","doi":"10.1109/ICDM.2007.30","DOIUrl":"https://doi.org/10.1109/ICDM.2007.30","url":null,"abstract":"This paper addresses the issue of extraction of an academic researcher social network. By researcher social network extraction, we are aimed at finding, extracting, and fusing the 'semantic '-based profiling information of a researcher from the Web. Previously, social network extraction was often undertaken separately in an ad-hoc fashion. This paper first gives a formalization of the entire problem. Specifically, it identifies the 'relevant documents' from the Web by a classifier. It then proposes a unified approach to perform the researcher profiling using conditional random fields (CRF). It integrates publications from the existing bibliography datasets. In the integration, it proposes a constraints-based probabilistic model to name disambiguation. Experimental results on an online system show that the unified approach to researcher profiling significantly outperforms the baseline methods of using rule learning or classification. Experimental results also indicate that our method to name disambiguation performs better than the baseline method using unsupervised learning. The methods have been applied to expert finding. Experiments show that the accuracy of expert finding can be significantly improved by using the proposed methods.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126265327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 141

General Averaged Divergence Analysis 一般平均散度分析

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.105

D. Tao, Xuelong Li, Xindong Wu, S. Maybank

引用次数: 67

Using Burstiness to Improve Clustering of Topics in News Streams 利用突发性改进新闻流中主题聚类

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.17

Qi He, Kuiyu Chang, Ee-Peng Lim

{"title":"Using Burstiness to Improve Clustering of Topics in News Streams","authors":"Qi He, Kuiyu Chang, Ee-Peng Lim","doi":"10.1109/ICDM.2007.17","DOIUrl":"https://doi.org/10.1109/ICDM.2007.17","url":null,"abstract":"Specialists who analyze online news have a hard time separating the wheat from the chaff. Moreover, automatic data-mining techniques like clustering of news streams into topical groups can fully recover the underlying true class labels of data if and only if all classes are well separated. In reality, especially for news streams, this is clearly not the case. The question to ask is thus this: if we cannot recover the full C classes by clustering, what is the largest K < C clusters we can find that best resemble the K underlying classes? Using the intuition that bursty topics are more likely to correspond to important events that are of interest to analysts, we propose several new bursty vector space models (B-VSM)for representing a news document. B-VSM takes into account the burstiness (across the full corpus and whole duration) of each constituent word in a document at the time of publication. We benchmarked our B-VSM against the classical TFIDF-VSM on the task of clustering a collection of news stream articles with known topic labels. Experimental results show that B-VSM was able to find the burstiest clusters/topics. Further, it also significantly improved the recall and precision for the top K clusters/topics.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131433775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 58

Computing Correlation Anomaly Scores Using Stochastic Nearest Neighbors 利用随机近邻计算相关异常分数

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.12

T. Idé, S. Papadimitriou, M. Vlachos

引用次数: 81

High-Speed Function Approximation 高速函数逼近

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI: 10.1109/ICDM.2007.107

Biswanath Panda, Mirek Riedewald, J. Gehrke, S. Pope

引用次数: 5