2017 IEEE International Conference on Data Mining (ICDM)最新文献_第8页

Many Heads are Better than One: Local Community Detection by the Multi-walker Chain 多头比一个好:多步行者链的本地社区检测

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.11

Yuchen Bian, Jingchao Ni, Wei Cheng, Xiang Zhang

{"title":"Many Heads are Better than One: Local Community Detection by the Multi-walker Chain","authors":"Yuchen Bian, Jingchao Ni, Wei Cheng, Xiang Zhang","doi":"10.1109/ICDM.2017.11","DOIUrl":"https://doi.org/10.1109/ICDM.2017.11","url":null,"abstract":"Local community detection (or local clustering) is of fundamental importance in large network analysis. Random walk based methods have been routinely used in this task. Most existing random walk methods are based on the single-walker model. However, without any guidance, a single-walker may not be adequate to effectively capture the local cluster. In this paper, we study a multi-walker chain (MWC) model, which allows multiple walkers to explore the network. Each walker is influenced (or pulled back) by all other walkers when deciding the next steps. This helps the walkers to stay as a group and within the cluster. We introduce two measures based on the mean and standard deviation of the visiting probabilities of the walkers. These measures not only can accurately identify the local cluster, but also help detect the cluster center and boundary, which cannot be achieved by the existing single-walker methods. We provide rigorous theoretical foundation for MWC, and devise efficient algorithms to compute it. Extensive experimental results on a variety of real-world networks demonstrate that MWC outperforms the state-of-the-art local community detection methods by a large margin.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124477805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

AutoLearn — Automated Feature Generation and Selection 自动学习-自动特征生成和选择

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.31

A. Kaul, Saket Maheshwary, Vikram Pudi

引用次数: 86

Automatic Classification of Music Genre Using Masked Conditional Neural Networks 基于屏蔽条件神经网络的音乐体裁自动分类

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.125

Fady Medhat, D. Chesmore, John A. Robinson

引用次数: 13

Large Scale Kernel Methods for Online AUC Maximization 在线AUC最大化的大规模核方法

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.18

Yi Ding, Chenghao Liu, P. Zhao, S. Hoi

{"title":"Large Scale Kernel Methods for Online AUC Maximization","authors":"Yi Ding, Chenghao Liu, P. Zhao, S. Hoi","doi":"10.1109/ICDM.2017.18","DOIUrl":"https://doi.org/10.1109/ICDM.2017.18","url":null,"abstract":"Learning to optimize AUC performance for classifying label imbalanced data in online scenarios has been extensively studied in recent years. Most of the existing work has attempted to address the problem directly in the original feature space, which may not suitable for non-linearly separable datasets. To solve this issue, some kernel-based learning methods are proposed for non-linearly separable datasets. However, such kernel approaches have been shown to be inefficient and failed to scale well on large scale datasets in practice. Taking this cue, in this work, we explore the use of scalable kernel-based learning techniques as surrogates to existing approaches: random Fourier features and Nyström method, for tackling the problem and bring insights to the differences between the two methods based on their online performance. In contrast to the conventional kernel-based learning methods which suffer from high computational complexity of the kernel matrix, our proposed approaches elevate this issue with linear features that approximate the kernel function/matrix. Specifically, two different surrogate kernel-based learning models are presented for addressing the online AUC maximization task: (i) the Fourier Online AUC Maximization (FOAM) algorithm that samples the basis functions from a data-independent distribution to approximate the kernel functions; and (ii) the Nyström Online AUC Maximization (NOAM) algorithm that samples a subset of instances from the training data to approximate the kernel matrix by a low rank matrix. Another novelty of the present work is the proposed mini-batch Online Gradient Descent method for model updating to control the noise and reduce the variance of gradients. We provide theoretical analyses for the two proposed algorithms. Empirical studies on commonly used large scale datasets show that the proposed algorithms outperformed existing state-of-the-art methods in terms of both AUC performance and computational efficiency.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128907025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Autoregressive Tensor Factorization for Spatio-Temporal Predictions 时空预测的自回归张量分解

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.146

Koh Takeuchi, H. Kashima, N. Ueda

{"title":"Autoregressive Tensor Factorization for Spatio-Temporal Predictions","authors":"Koh Takeuchi, H. Kashima, N. Ueda","doi":"10.1109/ICDM.2017.146","DOIUrl":"https://doi.org/10.1109/ICDM.2017.146","url":null,"abstract":"Analysis of spatio-temporal data is a common research topic that requires the interpolations of unknown locations and the predictions of feature observations by utilizing information about where and when the data were observed. One of the most difficult problems is to make predictions of unknown locations. Tensor factorization methods are popular in this field because of their capability of handling multiple types of spatio-temporal data, dealing with missing values, and providing computationally efficient parameter estimation procedures. However, unlike traditional approaches such as spatial autoregressive models, the existing tensor factorization methods have not tried to learn spatial autocorrelations. These methods employ previously inferred spatial dependencies, often resulting in poor performances on the problem of making interpolations and predictions of unknown locations. In this paper, we propose a new tensor factorization method that estimates low-rank latent factors by simultaneously learning the spatial and temporal autocorrelations. We introduce new spatial autoregressive regularizers based on existing spatial autoregressive models and provide an efficient estimation procedure. With experiments on publicly available traffic transporting data, we demonstrate that our proposed method significantly improves the predictive performances in our problems in comparison to the existing state-of-the-art spatio-temporal analysis methods.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129261885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43

Theoretically and Empirically High Quality Estimation of Closeness Centrality 接近中心性的理论与经验高质量估计

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.126

Shogo Murai

{"title":"Theoretically and Empirically High Quality Estimation of Closeness Centrality","authors":"Shogo Murai","doi":"10.1109/ICDM.2017.126","DOIUrl":"https://doi.org/10.1109/ICDM.2017.126","url":null,"abstract":"In the field of network analysis, centrality values of graph nodes, which represents the importance of nodes, have been widely studied. In this paper, we focus on one of the most basic centrality measures: closeness centrality. Since the exact computation of closeness centrality for all nodes of a network is prohibitively costly for massive networks, algorithms for estimating closeness centrality have been studied. In previous works, theoretical bounds on relative error have been improved. However, for complex networks such as social networks, empirical estimation qualities have hardly been improved since the sampling-based algorithm was proposed. In this paper, we propose simple and highly scalable algorithms for estimating closeness centrality of undirected networks. Our algorithms have theoretically and empirically better estimation quality than previous ones. As a result, our algorithms achieve strong quality guarantees and experimentally small relative errors at the same time. Also, our algorithms can be extended to strongly connected directed networks. Moreover, we can apply our algorithms to weighted centrality, in which nodes may have different weight, with slight modification.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129320876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Balanced Distribution Adaptation for Transfer Learning 迁移学习的平衡分布适应

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.150

Jindong Wang, Yiqiang Chen, Shuji Hao, Wenjie Feng, Zhiqi Shen

{"title":"Balanced Distribution Adaptation for Transfer Learning","authors":"Jindong Wang, Yiqiang Chen, Shuji Hao, Wenjie Feng, Zhiqi Shen","doi":"10.1109/ICDM.2017.150","DOIUrl":"https://doi.org/10.1109/ICDM.2017.150","url":null,"abstract":"Transfer learning has achieved promising results by leveraging knowledge from the source domain to annotate the target domain which has few or none labels. Existing methods often seek to minimize the distribution divergence between domains, such as the marginal distribution, the conditional distribution or both. However, these two distances are often treated equally in existing algorithms, which will result in poor performance in real applications. Moreover, existing methods usually assume that the dataset is balanced, which also limits their performances on imbalanced tasks that are quite common in real problems. To tackle the distribution adaptation problem, in this paper, we propose a novel transfer learning approach, named as Balanced Distribution Adaptation (BDA), which can adaptively leverage the importance of the marginal and conditional distribution discrepancies, and several existing methods can be treated as special cases of BDA. Based on BDA, we also propose a novel Weighted Balanced Distribution Adaptation (W-BDA) algorithm to tackle the class imbalance issue in transfer learning. W-BDA not only considers the distribution adaptation between domains but also adaptively changes the weight of each class. To evaluate the proposed methods, we conduct extensive experiments on several transfer learning tasks, which demonstrate the effectiveness of our proposed algorithms over several state-of-the-art methods.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"45 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134382556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 369

Matrix Profile VIII: Domain Agnostic Online Semantic Segmentation at Superhuman Performance Levels 矩阵概要VIII:超越人类性能水平的领域不可知在线语义分割

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.21

Shaghayegh Gharghabi, Yifei Ding, Chin-Chia Michael Yeh, Kaveh Kamgar, Liudmila Ulanova, Eamonn J. Keogh

{"title":"Matrix Profile VIII: Domain Agnostic Online Semantic Segmentation at Superhuman Performance Levels","authors":"Shaghayegh Gharghabi, Yifei Ding, Chin-Chia Michael Yeh, Kaveh Kamgar, Liudmila Ulanova, Eamonn J. Keogh","doi":"10.1109/ICDM.2017.21","DOIUrl":"https://doi.org/10.1109/ICDM.2017.21","url":null,"abstract":"Unsupervised semantic segmentation in the time series domain is a much-studied problem due to its potential to detect unexpected regularities and regimes in poorly understood data. However, the current techniques have several shortcomings, which have limited the adoption of time series semantic segmentation beyond academic settings for three primary reasons. First, most methods require setting/learning many parameters and thus may have problems generalizing to novel situations. Second, most methods implicitly assume that all the data is segmentable, and have difficulty when that assumption is unwarranted. Finally, most research efforts have been confined to the batch case, but online segmentation is clearly more useful and actionable. To address these issues, we present an algorithm which is domain agnostic, has only one easily determined parameter, and can handle data streaming at a high rate. In this context, we test our algorithm on the largest and most diverse collection of time series datasets ever considered, and demonstrate our algorithm's superiority over current solutions. Furthermore, we are the first to show that semantic segmentation may be possible at superhuman performance levels.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"7 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114128035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 76

Multi-party Sparse Discriminant Learning 多方稀疏判别学习

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.86

Jiang Bian, Haoyi Xiong, Wei Cheng, Wenqing Hu, Zhishan Guo, Yanjie Fu

{"title":"Multi-party Sparse Discriminant Learning","authors":"Jiang Bian, Haoyi Xiong, Wei Cheng, Wenqing Hu, Zhishan Guo, Yanjie Fu","doi":"10.1109/ICDM.2017.86","DOIUrl":"https://doi.org/10.1109/ICDM.2017.86","url":null,"abstract":"Sparse Discriminant Analysis (SDA) has been widely used to improve the performance of classical Fisher's Linear Discriminant Analysis in supervised metric learning, feature selection and classification. With the increasing needs of distributed data collection, storage and processing, enabling the Sparse Discriminant Learning to embrace the Multi-Party distributed computing environments becomes an emerging research topic. This paper proposes a novel Multi-Party SDA algorithm, which can learn SDA models effectively without sharing any raw dataand basic statistics among machines. The proposed algorithm 1) leverages the direct estimation of SDA [1] to derive a distributed loss function for the discriminant learning, 2) parameterizes the distributed loss function with local/global estimates through bootstrapping, and 3) approximates a global estimation of linear discriminant projection vector by optimizing the \"distributed bootstrapping loss function\" with gossip-based stochastic gradient descent. Experimental results on both synthetic and real-world benchmark datasets show that our algorithm can compete with the centralized SDA with similar performance, and significantly outperforms the most recent distributed SDA [2] in terms of accuracy and F1-score.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122314527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Tensor Based Relations Ranking for Multi-relational Collective Classification 多关系集体分类中基于张量的关系排序

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.112

Chao Han, Qingyao Wu, M. Ng, Jiezhang Cao, Mingkui Tan, Jian Chen

{"title":"Tensor Based Relations Ranking for Multi-relational Collective Classification","authors":"Chao Han, Qingyao Wu, M. Ng, Jiezhang Cao, Mingkui Tan, Jian Chen","doi":"10.1109/ICDM.2017.112","DOIUrl":"https://doi.org/10.1109/ICDM.2017.112","url":null,"abstract":"In this paper, we study relations ranking and object classification for multi-relational data where objects are interconnected by multiple relations. The relations among objects should be exploited for achieving a good classification. While most existing approaches exploit either by directly counting the number of connections among objects or by learning the weight of each relation from labeled data only. In this paper, we propose an algorithm, TensorRRCC, which is able to determine the ranking of relations and the labels of objects simultaneously. Our basic idea is that highly ranked relations within a class should play more important roles in object classification, and class membership information is important for determining a ranking quality over the relations w.r.t. a specific learning task. TensorRRCC implements the idea by modeling a Markov chain on transition probability graphs from connection and feature information with both labeled and unlabeled objects and propagates the ranking scores of relations and relevant classes of objects. An iterative progress is proposed to solve a set of tensor equations to obtain the stationary distribution of relations and objects. We compared our algorithm with current collective classification algorithms on two real-world data sets and the experimental results show the superiority of our method.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"347 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125804800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3