2017 IEEE International Conference on Data Mining (ICDM)最新文献_第10页

Relational Mixture of Experts: Explainable Demographics Prediction with Behavioral Data 专家的关系混合:可解释的人口统计预测与行为数据

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.45

M. Oyamada, S. Nakadai

{"title":"Relational Mixture of Experts: Explainable Demographics Prediction with Behavioral Data","authors":"M. Oyamada, S. Nakadai","doi":"10.1109/ICDM.2017.45","DOIUrl":"https://doi.org/10.1109/ICDM.2017.45","url":null,"abstract":"Given a collection of basic customer demographics (e.g., age and gender) andtheir behavioral data (e.g., item purchase histories), how can we predictsensitive demographics (e.g., income and occupation) that not every customermakes available?This demographics prediction problem is modeled as a classification task inwhich a customer's sensitive demographic y is predicted from his featurevector x. So far, two lines of work have tried to produce a\"good\" feature vector x from the customer's behavioraldata: (1) application-specific feature engineering using behavioral data and (2) representation learning (such as singular value decomposition or neuralembedding) on behavioral data. Although these approaches successfullyimprove the predictive performance, (1) designing a good feature requiresdomain experts to make a great effort and (2) features obtained fromrepresentation learning are hard to interpret. To overcome these problems, we present a Relational Infinite SupportVector Machine (R-iSVM), a mixture-of-experts model that can leveragebehavioral data. Instead of augmenting the feature vectors of customers, R-iSVM uses behavioral data to find out behaviorally similar customerclusters and constructs a local prediction model at each customer cluster. In doing so, R-iSVM successfully improves the predictive performance withoutrequiring application-specific feature designing and hard-to-interpretrepresentations. Experimental results on three real-world datasets demonstrate the predictiveperformance and interpretability of R-iSVM. Furthermore, R-iSVM can co-existwith previous demographics prediction methods to further improve theirpredictive performance.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114803297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Fast Compressive Spectral Clustering 快速压缩光谱聚类

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.120

Tingshu Li, Yiming Zhang, Dongsheng Li, Xinwang Liu, Yuxing Peng

{"title":"Fast Compressive Spectral Clustering","authors":"Tingshu Li, Yiming Zhang, Dongsheng Li, Xinwang Liu, Yuxing Peng","doi":"10.1109/ICDM.2017.120","DOIUrl":"https://doi.org/10.1109/ICDM.2017.120","url":null,"abstract":"Compressive spectral clustering (CSC) efficiently leverages graph filter and random sampling techniques to speed up clustering process. However, we find that CSC algorithm suffers from two main problems: i) The direct use of the dichotomy and eigencount techniques for estimating laplacian matrix’s k-th eigenvalue is expensive. ii) The computation of polynomial approximation repeats in each iteration for every cluster in the interpolation process, which occupies most of the computation time of CSC. To address these problems, we propose a new approach called FCSC for fast compressive spectral clustering. FCSC addresses the first problem by assuming that the eigenvalues approximately satisfy local uniform distribution, and addresses the second problem by recalculating the pairwise similarity between nodes with low-dimensional representation to reconstruct denoised laplacian matrix. The time complexity of reconstruction is linear with the number of non-zeros in laplacian matrix. As experimentally demonstrated on artificial and real-world datasets, our approach significantly reduces the computation time while preserving high clustering accuracy comparable to previous designs, verifying the effectiveness of FCSC.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122176977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

LCD: A Fast Contrastive Divergence Based Algorithm for Restricted Boltzmann Machine 一种基于快速对比发散的受限玻尔兹曼机算法

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.131

Lin Ning, Randall Pittman, Xipeng Shen

引用次数: 15

Statistical Link Label Modeling for Sign Prediction: Smoothing Sparsity by Joining Local and Global Information 用于符号预测的统计链接标签建模:通过连接局部和全局信息平滑稀疏性

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.135

Amin Javari, Hongxiang Qiu, Elham Barzegaran, M. Jalili, K. Chang

{"title":"Statistical Link Label Modeling for Sign Prediction: Smoothing Sparsity by Joining Local and Global Information","authors":"Amin Javari, Hongxiang Qiu, Elham Barzegaran, M. Jalili, K. Chang","doi":"10.1109/ICDM.2017.135","DOIUrl":"https://doi.org/10.1109/ICDM.2017.135","url":null,"abstract":"One of the major issues in signed networks is to use network structure to predict the missing sign of an edge. In this paper, we introduce a novel probabilistic approach for the sign prediction problem. The main characteristic of the proposed models is their ability to adapt to the sparsity level of an input network. Building a model that has an ability to adapt to the sparsity of the data has not yet been considered in the previous related works. We suggest that there exists a dilemma between local and global structures and attempt to build sparsity adaptive models by resolving this dilemma. To this end, we propose probabilistic prediction models based on local and global structures and integrate them based on the concept of smoothing. The model relies more on the global structures when the sparsity increases, whereas it gives more weights to the information obtained from local structures for low levels of the sparsity. The proposed model is assessed on three real-world signed networks, and the experiments reveal its consistent superiority over the state of the art methods. As compared to the previous methods, the proposed model not only better handles the sparsity problem, but also has lower computational complexity and can be updated using real-time data streams.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":" 656","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120829290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

iNEAT: Incomplete Network Alignment iNEAT:不完全网络对齐

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.160

Si Zhang, Hanghang Tong, Jie Tang, Jiejun Xu, Wei Fan

引用次数: 27

Aspect Sentiment Model for Micro Reviews 面向微评论的面向情感模型

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.83

Reinald Kim Amplayo, Seung-won Hwang

引用次数: 14

Informing the Use of Hyperparameter Optimization Through Metalearning 通过元学习来通知超参数优化的使用

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.137

Samantha Sanders, C. Giraud-Carrier

引用次数: 34

BiCycle: Item Recommendation with Life Cycles 自行车:有生命周期的项目推荐

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.39

Xinyue Liu, Y. Song, C. Aggarwal, Yao Zhang, Xiangnan Kong

{"title":"BiCycle: Item Recommendation with Life Cycles","authors":"Xinyue Liu, Y. Song, C. Aggarwal, Yao Zhang, Xiangnan Kong","doi":"10.1109/ICDM.2017.39","DOIUrl":"https://doi.org/10.1109/ICDM.2017.39","url":null,"abstract":"Recommender systems have attracted much attention in last decades, which can help the users explore new items in many applications. As a popular technique in recommender systems, item recommendation works by recommending items to users based on their historical interactions. Conventional item recommendation methods usually assume that users and items are stationary, which is not always the case in real-world applications. Many time-aware item recommendation models have been proposed to take the temporal effects into the considerations based on the absolute time stamps associated with observed interactions. We show that using absolute time to model temporal effects can be limited in some circumstances. In this work, we propose to model the temporal dynamics of both users and items in item recommendation based on their life cycles. This problem is very challenging to solve since the users and items can co-evolve in their life cycles and the sparseness of the data become more severe when we consider the life cycles of both users and items. A novel time-aware item recommendation model called BiCycle is proposed to address these challenges. BiCycle is designed based on two important observations: 1) correlated users or items usually share similar patterns in the similar stages of their life cycles. 2) user preferences and item characters can evolve gradually over different stages of their life cycles. Extensive experiments conducted on three real-world datasets demonstrate the proposed approach can significantly improve the performance of recommendation tasks by considering the inner life cycles of both users and items.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125650081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

MDL for Causal Inference on Discrete Data 离散数据因果推理的MDL

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.87

Kailash Budhathoki, Jilles Vreeken

{"title":"MDL for Causal Inference on Discrete Data","authors":"Kailash Budhathoki, Jilles Vreeken","doi":"10.1109/ICDM.2017.87","DOIUrl":"https://doi.org/10.1109/ICDM.2017.87","url":null,"abstract":"The algorithmic Markov condition states that the most likely causal direction between two random variables X and Y can be identified as the direction with the lowest Kolmogorov complexity. This notion is very powerful as it can detect any causal dependency that can be explained by a physical process. However, due to the halting problem, it is also not computable. In this paper we propose an computable instantiation that provably maintains the key aspects of the ideal. We propose to approximate Kolmogorov complexity via the Minimum Description Length (MDL) principle, using a score that is mini-max optimal with regard to the model class under consideration. This means that even in an adversarial setting, the score degrades gracefully, and we are still maximally able to detect dependencies between the marginal and the conditional distribution. As a proof of concept, we propose CISC, a linear-time algorithm for causal inference by stochastic complexity, for pairs of univariate discrete variables. Experiments show that CISC is highly accurate on synthetic, benchmark, as well as real-world data, outperforming the state of the art by a margin, and scales extremely well with regard to sample and domain sizes.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121479877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

A Probabilistic Approach for Learning with Label Proportions Applied to the US Presidential Election 标签比例的概率学习方法在美国总统选举中的应用

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI: 10.1109/ICDM.2017.54

Tao Sun, D. Sheldon, Brendan OConnor

{"title":"A Probabilistic Approach for Learning with Label Proportions Applied to the US Presidential Election","authors":"Tao Sun, D. Sheldon, Brendan OConnor","doi":"10.1109/ICDM.2017.54","DOIUrl":"https://doi.org/10.1109/ICDM.2017.54","url":null,"abstract":"Ecological inference (EI) is a classical problem from political science to model voting behavior of individuals given only aggregate election results. Flaxman et al. recently formulated EI as machine learning problem using distribution regression, and applied it to analyze US presidential elections. However, distribution regression unnecessarily aggregates individual-level covariates available from census microdata, and ignores known structure of the aggregation mechanism. We instead formulate the problem as learning with label proportions (LLP), and develop a new, probabilistic, LLP method to solve it. Our model is the straightforward one where individual votes are latent variables. We use cardinality potentials to efficiently perform exact inference over latent variables during learning, and introduce a novel message-passing algorithm to extend cardinality potentials to multivariate probability models for use within multiclass LLP problems. We show experimentally that LLP outperforms distribution regression for predicting individual-level attributes, and that our method is as good as or better than existing state-of-the-art LLP methods.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132164103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23