Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining最新文献_第5页

From Group to Individual Labels Using Deep Features 使用深度特征从组标签到个人标签

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2783380

Dimitrios Kotzias, Misha Denil, Nando de Freitas, Padhraic Smyth

引用次数: 289

Model Multiple Heterogeneity via Hierarchical Multi-Latent Space Learning 基于层次多潜空间学习的多元异质性模型

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2783330

Pei Yang, Jingrui He

引用次数: 13

A Clustering-Based Framework to Control Block Sizes for Entity Resolution 基于聚类的实体分辨率块大小控制框架

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2783396

Jeffrey Fisher, P. Christen, Qing Wang, E. Rahm

{"title":"A Clustering-Based Framework to Control Block Sizes for Entity Resolution","authors":"Jeffrey Fisher, P. Christen, Qing Wang, E. Rahm","doi":"10.1145/2783258.2783396","DOIUrl":"https://doi.org/10.1145/2783258.2783396","url":null,"abstract":"Entity resolution (ER) is a common data cleaning task that involves determining which records from one or more data sets refer to the same real-world entities. Because a pairwise comparison of all records scales quadratically with the number of records in the data sets to be matched, it is common to use blocking or indexing techniques to reduce the number of comparisons required. These techniques split the data sets into blocks and only records within blocks are compared with each other. Most existing blocking techniques do not provide control over the size of the generated blocks, despite this control being important in many practical applications of ER, such as privacy-preserving record linkage and real-time ER. We propose two novel hierarchical clustering approaches which can generate blocks within a specified size range, and we present a penalty function which allows control of the trade-off between block quality and block size in the clustering process. We evaluate our techniques on three real-world data sets and compare them against three baseline approaches. The results show our proposed techniques perform well on the measures of pairs completeness and reduction ratio compared to the baseline approaches, while also satisfying the block size restrictions.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115222837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 61

Query Workloads for Data Series Indexes 查询数据系列索引的工作负载

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2783382

Konstantinos Zoumpatianos, Yin Lou, Themis Palpanas, J. Gehrke

{"title":"Query Workloads for Data Series Indexes","authors":"Konstantinos Zoumpatianos, Yin Lou, Themis Palpanas, J. Gehrke","doi":"10.1145/2783258.2783382","DOIUrl":"https://doi.org/10.1145/2783258.2783382","url":null,"abstract":"Data series are a prevalent data type that has attracted lots of interest in recent years. Most of the research has focused on how to efficiently support similarity or nearest neighbor queries over large data series collections (an important data mining task), and several data series summarization and indexing methods have been proposed in order to solve this problem. Nevertheless, up to this point very little attention has been paid to properly evaluating such index structures, with most previous work relying solely on randomly selected data series to use as queries (with/without adding noise). In this work, we show that random workloads are inherently not suitable for the task at hand and we argue that there is a need for carefully generating a query workload. We define measures that capture the characteristics of queries, and we propose a method for generating workloads with the desired properties, that is, effectively evaluating and comparing data series summarizations and indexes. In our experimental evaluation, with carefully controlled query workloads, we shed light on key factors affecting the performance of nearest neighbor search in large data series collections.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115715120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Learning Tree Structure in Multi-Task Learning 多任务学习中的学习树结构

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2783393

Lei Han, Yu Zhang

{"title":"Learning Tree Structure in Multi-Task Learning","authors":"Lei Han, Yu Zhang","doi":"10.1145/2783258.2783393","DOIUrl":"https://doi.org/10.1145/2783258.2783393","url":null,"abstract":"In multi-task learning (MTL), multiple related tasks are learned jointly by sharing information according to task relations. One promising approach is to utilize the given tree structure, which describes the hierarchical relations among tasks, to learn model parameters under the regularization framework. However, such a priori information is rarely available in most applications. To the best of our knowledge, there is no work to learn the tree structure among tasks and model parameters simultaneously under the regularization framework and in this paper, we develop a TAsk Tree (TAT) model for MTL to achieve this. By specifying the number of layers in the tree as H, the TAT method decomposes the parameter matrix into H component matrices, each of which corresponds to the model parameters in each layer of the tree. In order to learn the tree structure, we devise sequential constraints to make the distance between the parameters in the component matrices corresponding to each pair of tasks decrease over layers, and hence the component parameters will keep fused until the topmost layer, once they become fused in a layer. Moreover, to make the component parameters have chance to fuse in different layers, we develop a structural sparsity regularizer, which is the sum of the l2 norm on the pairwise difference among the component parameters, to learn layer-specific task structure. In order to solve the resulting non-convex objective function, we use the general iterative shrinkage and thresholding (GIST) method. By using the alternating direction method of multipliers (ADMM) method, we decompose the proximal problem in the GIST method into three independent subproblems, where a key subproblem with the sequential constraints has an efficient solution as the other two subproblems do. We also provide some theoretical analysis for the TAT model. Experiments on both synthetic and real-world datasets show the effectiveness of the TAT model.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114209557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 66

Hierarchical Graph-Coupled HMMs for Heterogeneous Personalized Health Data 异构个性化健康数据的分层图耦合hmm

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2783326

Kai Fan, Marisa C. Eisenberg, Alison Walsh, A. Aiello, K. Heller

引用次数: 18

Cuckoo Linear Algebra 杜鹃线性代数

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2783263

Li Zhou, D. Andersen, Mu Li, Alex Smola

引用次数: 1

Deep Computational Phenotyping 深度计算表型

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2783365

Zhengping Che, David C. Kale, Wenzhe Li, M. T. Bahadori, Yan Liu

{"title":"Deep Computational Phenotyping","authors":"Zhengping Che, David C. Kale, Wenzhe Li, M. T. Bahadori, Yan Liu","doi":"10.1145/2783258.2783365","DOIUrl":"https://doi.org/10.1145/2783258.2783365","url":null,"abstract":"We apply deep learning to the problem of discovery and detection of characteristic patterns of physiology in clinical time series data. We propose two novel modifications to standard neural net training that address challenges and exploit properties that are peculiar, if not exclusive, to medical data. First, we examine a general framework for using prior knowledge to regularize parameters in the topmost layers. This framework can leverage priors of any form, ranging from formal ontologies (e.g., ICD9 codes) to data-derived similarity. Second, we describe a scalable procedure for training a collection of neural networks of different sizes but with partially shared architectures. Both of these innovations are well-suited to medical applications, where available data are not yet Internet scale and have many sparse outputs (e.g., rare diagnoses) but which have exploitable structure (e.g., temporal order and relationships between labels). However, both techniques are sufficiently general to be applied to other problems and domains. We demonstrate the empirical efficacy of both techniques on two real-world hospital data sets and show that the resulting neural nets learn interpretable and clinically relevant features.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123502040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 246

Integrating Vertex-centric Clustering with Edge-centric Clustering for Meta Path Graph Analysis 融合顶点中心聚类和边缘中心聚类的元路径图分析

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2783328

Yang Zhou, Ling Liu, David J. Buttler

{"title":"Integrating Vertex-centric Clustering with Edge-centric Clustering for Meta Path Graph Analysis","authors":"Yang Zhou, Ling Liu, David J. Buttler","doi":"10.1145/2783258.2783328","DOIUrl":"https://doi.org/10.1145/2783258.2783328","url":null,"abstract":"Meta paths are good mechanisms to improve the quality of graph analysis on heterogeneous information networks. This paper presents a meta path graph clustering framework, VEPATHCLUSTER, that combines meta path vertex-centric clustering with meta path edge-centric clustering for improving the clustering quality of heterogeneous networks. First, we propose an edge-centric path graph model to capture the meta-path dependencies between pairwise path edges. We model a heterogeneous network containing M types of meta paths as M vertex-centric path graphs and M edge-centric path graphs. Second, we propose a clustering-based multigraph model to capture the fine-grained clustering-based relationships between pairwise vertices and between pairwise path edges. We perform clustering analysis on both a unified vertex-centric path graph and each edge-centric path graph to generate vertex clustering and edge clusterings of the original heterogeneous network respectively. Third, a reinforcement algorithm is provided to tightly integrate vertex-centric clustering and edge-centric clustering by mutually enhancing each other. Finally, an iterative learning strategy is presented to dynamically refine both vertex-centric clustering and edge-centric clustering by continuously learning the contributions and adjusting the weights of different path graphs.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124942207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Stock Constrained Recommendation in Tmall 天猫受限推荐

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI: 10.1145/2783258.2788565

Leon Wenliang Zhong, Rong Jin, Cheng Yang, Xiaowei Yan, Qi Zhang, Qiang Li

{"title":"Stock Constrained Recommendation in Tmall","authors":"Leon Wenliang Zhong, Rong Jin, Cheng Yang, Xiaowei Yan, Qi Zhang, Qiang Li","doi":"10.1145/2783258.2788565","DOIUrl":"https://doi.org/10.1145/2783258.2788565","url":null,"abstract":"A large number of recommender systems have been developed to serve users with interesting news, ads, products or other contents. One main limitation with the existing work is that they do not take into account the inventory size of of items to be recommended. As a result, popular items are likely to be out of stock soon as they have been recommended and sold to many users, significantly affecting the impact of recommendation and user experience. This observation motivates us to develop a novel aware recommender system. It jointly optimizes the recommended items for all users based on both user preference and inventory sizes of different items. It requires solving a non-smooth optimization involved estimating a matrix of nxn, where n is the number of items. With the proliferation of items, this approach can quickly become computationally infeasible. We address this challenge by developing a dual method that reduces the number of variables from n^2 to n, significantly improving the computational efficiency. We also extend this approach to the online setting, which is particularly important for big promotion events. Our empirical studies based on a real benchmark data with 100 millions of user visits from Tmall verify the effectiveness of the proposed approach.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122122582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18