Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining最新文献

筛选
英文 中文
From Group to Individual Labels Using Deep Features 使用深度特征从组标签到个人标签
Dimitrios Kotzias, Misha Denil, Nando de Freitas, Padhraic Smyth
{"title":"From Group to Individual Labels Using Deep Features","authors":"Dimitrios Kotzias, Misha Denil, Nando de Freitas, Padhraic Smyth","doi":"10.1145/2783258.2783380","DOIUrl":"https://doi.org/10.1145/2783258.2783380","url":null,"abstract":"In many classification problems labels are relatively scarce. One context in which this occurs is where we have labels for groups of instances but not for the instances themselves, as in multi-instance learning. Past work on this problem has typically focused on learning classifiers to make predictions at the group level. In this paper we focus on the problem of learning classifiers to make predictions at the instance level. To achieve this we propose a new objective function that encourages smoothness of inferred instance-level labels based on instance-level similarity, while at the same time respecting group-level label constraints. We apply this approach to the problem of predicting labels for sentences given labels for reviews, using a convolutional neural network to infer sentence similarity. The approach is evaluated using three large review data sets from IMDB, Yelp, and Amazon, and we demonstrate the proposed approach is both accurate and scalable compared to various alternatives.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"164 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122867316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 289
Model Multiple Heterogeneity via Hierarchical Multi-Latent Space Learning 基于层次多潜空间学习的多元异质性模型
Pei Yang, Jingrui He
{"title":"Model Multiple Heterogeneity via Hierarchical Multi-Latent Space Learning","authors":"Pei Yang, Jingrui He","doi":"10.1145/2783258.2783330","DOIUrl":"https://doi.org/10.1145/2783258.2783330","url":null,"abstract":"In many real world applications such as satellite image analysis, gene function prediction, and insider threat detection, the data collected from heterogeneous sources often exhibit multiple types of heterogeneity, such as task heterogeneity, view heterogeneity, and label heterogeneity. To address this problem, we propose a Hierarchical Multi-Latent Space (HiMLS) learning approach to jointly model the triple types of heterogeneity. The basic idea is to learn a hierarchical multi-latent space by which we can simultaneously leverage the task relatedness, view consistency and the label correlations to improve the learning performance. We first propose a multi-latent space framework to model the complex heterogeneity, which is used as a building block to stack up a multi-layer structure so as to learn the hierarchical multi-latent space. In such a way, we can gradually learn the more abstract concepts in the higher level. Then, a deep learning algorithm is proposed to solve the optimization problem. The experimental results on various data sets show the effectiveness of the proposed approach.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134022387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A Clustering-Based Framework to Control Block Sizes for Entity Resolution 基于聚类的实体分辨率块大小控制框架
Jeffrey Fisher, P. Christen, Qing Wang, E. Rahm
{"title":"A Clustering-Based Framework to Control Block Sizes for Entity Resolution","authors":"Jeffrey Fisher, P. Christen, Qing Wang, E. Rahm","doi":"10.1145/2783258.2783396","DOIUrl":"https://doi.org/10.1145/2783258.2783396","url":null,"abstract":"Entity resolution (ER) is a common data cleaning task that involves determining which records from one or more data sets refer to the same real-world entities. Because a pairwise comparison of all records scales quadratically with the number of records in the data sets to be matched, it is common to use blocking or indexing techniques to reduce the number of comparisons required. These techniques split the data sets into blocks and only records within blocks are compared with each other. Most existing blocking techniques do not provide control over the size of the generated blocks, despite this control being important in many practical applications of ER, such as privacy-preserving record linkage and real-time ER. We propose two novel hierarchical clustering approaches which can generate blocks within a specified size range, and we present a penalty function which allows control of the trade-off between block quality and block size in the clustering process. We evaluate our techniques on three real-world data sets and compare them against three baseline approaches. The results show our proposed techniques perform well on the measures of pairs completeness and reduction ratio compared to the baseline approaches, while also satisfying the block size restrictions.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115222837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
Query Workloads for Data Series Indexes 查询数据系列索引的工作负载
Konstantinos Zoumpatianos, Yin Lou, Themis Palpanas, J. Gehrke
{"title":"Query Workloads for Data Series Indexes","authors":"Konstantinos Zoumpatianos, Yin Lou, Themis Palpanas, J. Gehrke","doi":"10.1145/2783258.2783382","DOIUrl":"https://doi.org/10.1145/2783258.2783382","url":null,"abstract":"Data series are a prevalent data type that has attracted lots of interest in recent years. Most of the research has focused on how to efficiently support similarity or nearest neighbor queries over large data series collections (an important data mining task), and several data series summarization and indexing methods have been proposed in order to solve this problem. Nevertheless, up to this point very little attention has been paid to properly evaluating such index structures, with most previous work relying solely on randomly selected data series to use as queries (with/without adding noise). In this work, we show that random workloads are inherently not suitable for the task at hand and we argue that there is a need for carefully generating a query workload. We define measures that capture the characteristics of queries, and we propose a method for generating workloads with the desired properties, that is, effectively evaluating and comparing data series summarizations and indexes. In our experimental evaluation, with carefully controlled query workloads, we shed light on key factors affecting the performance of nearest neighbor search in large data series collections.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115715120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Learning Tree Structure in Multi-Task Learning 多任务学习中的学习树结构
Lei Han, Yu Zhang
{"title":"Learning Tree Structure in Multi-Task Learning","authors":"Lei Han, Yu Zhang","doi":"10.1145/2783258.2783393","DOIUrl":"https://doi.org/10.1145/2783258.2783393","url":null,"abstract":"In multi-task learning (MTL), multiple related tasks are learned jointly by sharing information according to task relations. One promising approach is to utilize the given tree structure, which describes the hierarchical relations among tasks, to learn model parameters under the regularization framework. However, such a priori information is rarely available in most applications. To the best of our knowledge, there is no work to learn the tree structure among tasks and model parameters simultaneously under the regularization framework and in this paper, we develop a TAsk Tree (TAT) model for MTL to achieve this. By specifying the number of layers in the tree as H, the TAT method decomposes the parameter matrix into H component matrices, each of which corresponds to the model parameters in each layer of the tree. In order to learn the tree structure, we devise sequential constraints to make the distance between the parameters in the component matrices corresponding to each pair of tasks decrease over layers, and hence the component parameters will keep fused until the topmost layer, once they become fused in a layer. Moreover, to make the component parameters have chance to fuse in different layers, we develop a structural sparsity regularizer, which is the sum of the l2 norm on the pairwise difference among the component parameters, to learn layer-specific task structure. In order to solve the resulting non-convex objective function, we use the general iterative shrinkage and thresholding (GIST) method. By using the alternating direction method of multipliers (ADMM) method, we decompose the proximal problem in the GIST method into three independent subproblems, where a key subproblem with the sequential constraints has an efficient solution as the other two subproblems do. We also provide some theoretical analysis for the TAT model. Experiments on both synthetic and real-world datasets show the effectiveness of the TAT model.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114209557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 66
Hierarchical Graph-Coupled HMMs for Heterogeneous Personalized Health Data 异构个性化健康数据的分层图耦合hmm
Kai Fan, Marisa C. Eisenberg, Alison Walsh, A. Aiello, K. Heller
{"title":"Hierarchical Graph-Coupled HMMs for Heterogeneous Personalized Health Data","authors":"Kai Fan, Marisa C. Eisenberg, Alison Walsh, A. Aiello, K. Heller","doi":"10.1145/2783258.2783326","DOIUrl":"https://doi.org/10.1145/2783258.2783326","url":null,"abstract":"The purpose of this study is to leverage modern technology (mobile or web apps) to enrich epidemiology data and infer the transmission of disease. We develop hierarchical Graph-Coupled Hidden Markov Models (hGCHMMs) to simultaneously track the spread of infection in a small cell phone community and capture person-specific infection parameters by leveraging a link prior that incorporates additional covariates. In this paper we investigate two link functions, the beta-exponential link and sigmoid link, both of which allow the development of a principled Bayesian hierarchical framework for disease transmission. The results of our model allow us to predict the probability of infection for each persons on each day, and also to infer personal physical vulnerability and the relevant association with covariates. We demonstrate our approach theoretically and experimentally on both simulation data and real epidemiological records.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117309534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Cuckoo Linear Algebra 杜鹃线性代数
Li Zhou, D. Andersen, Mu Li, Alex Smola
{"title":"Cuckoo Linear Algebra","authors":"Li Zhou, D. Andersen, Mu Li, Alex Smola","doi":"10.1145/2783258.2783263","DOIUrl":"https://doi.org/10.1145/2783258.2783263","url":null,"abstract":"In this paper we present a novel data structure for sparse vectors based on Cuckoo hashing. It is highly memory efficient and allows for random access at near dense vector level rates. This allows us to solve sparse l1 programming problems exactly and without preprocessing at a cost that is identical to dense linear algebra both in terms of memory and speed. Our approach provides a feasible alternative to the hash kernel and it excels whenever exact solutions are required, such as for feature selection.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115845821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deep Computational Phenotyping 深度计算表型
Zhengping Che, David C. Kale, Wenzhe Li, M. T. Bahadori, Yan Liu
{"title":"Deep Computational Phenotyping","authors":"Zhengping Che, David C. Kale, Wenzhe Li, M. T. Bahadori, Yan Liu","doi":"10.1145/2783258.2783365","DOIUrl":"https://doi.org/10.1145/2783258.2783365","url":null,"abstract":"We apply deep learning to the problem of discovery and detection of characteristic patterns of physiology in clinical time series data. We propose two novel modifications to standard neural net training that address challenges and exploit properties that are peculiar, if not exclusive, to medical data. First, we examine a general framework for using prior knowledge to regularize parameters in the topmost layers. This framework can leverage priors of any form, ranging from formal ontologies (e.g., ICD9 codes) to data-derived similarity. Second, we describe a scalable procedure for training a collection of neural networks of different sizes but with partially shared architectures. Both of these innovations are well-suited to medical applications, where available data are not yet Internet scale and have many sparse outputs (e.g., rare diagnoses) but which have exploitable structure (e.g., temporal order and relationships between labels). However, both techniques are sufficiently general to be applied to other problems and domains. We demonstrate the empirical efficacy of both techniques on two real-world hospital data sets and show that the resulting neural nets learn interpretable and clinically relevant features.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123502040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 246
Integrating Vertex-centric Clustering with Edge-centric Clustering for Meta Path Graph Analysis 融合顶点中心聚类和边缘中心聚类的元路径图分析
Yang Zhou, Ling Liu, David J. Buttler
{"title":"Integrating Vertex-centric Clustering with Edge-centric Clustering for Meta Path Graph Analysis","authors":"Yang Zhou, Ling Liu, David J. Buttler","doi":"10.1145/2783258.2783328","DOIUrl":"https://doi.org/10.1145/2783258.2783328","url":null,"abstract":"Meta paths are good mechanisms to improve the quality of graph analysis on heterogeneous information networks. This paper presents a meta path graph clustering framework, VEPATHCLUSTER, that combines meta path vertex-centric clustering with meta path edge-centric clustering for improving the clustering quality of heterogeneous networks. First, we propose an edge-centric path graph model to capture the meta-path dependencies between pairwise path edges. We model a heterogeneous network containing M types of meta paths as M vertex-centric path graphs and M edge-centric path graphs. Second, we propose a clustering-based multigraph model to capture the fine-grained clustering-based relationships between pairwise vertices and between pairwise path edges. We perform clustering analysis on both a unified vertex-centric path graph and each edge-centric path graph to generate vertex clustering and edge clusterings of the original heterogeneous network respectively. Third, a reinforcement algorithm is provided to tightly integrate vertex-centric clustering and edge-centric clustering by mutually enhancing each other. Finally, an iterative learning strategy is presented to dynamically refine both vertex-centric clustering and edge-centric clustering by continuously learning the contributions and adjusting the weights of different path graphs.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124942207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Stock Constrained Recommendation in Tmall 天猫受限推荐
Leon Wenliang Zhong, Rong Jin, Cheng Yang, Xiaowei Yan, Qi Zhang, Qiang Li
{"title":"Stock Constrained Recommendation in Tmall","authors":"Leon Wenliang Zhong, Rong Jin, Cheng Yang, Xiaowei Yan, Qi Zhang, Qiang Li","doi":"10.1145/2783258.2788565","DOIUrl":"https://doi.org/10.1145/2783258.2788565","url":null,"abstract":"A large number of recommender systems have been developed to serve users with interesting news, ads, products or other contents. One main limitation with the existing work is that they do not take into account the inventory size of of items to be recommended. As a result, popular items are likely to be out of stock soon as they have been recommended and sold to many users, significantly affecting the impact of recommendation and user experience. This observation motivates us to develop a novel aware recommender system. It jointly optimizes the recommended items for all users based on both user preference and inventory sizes of different items. It requires solving a non-smooth optimization involved estimating a matrix of nxn, where n is the number of items. With the proliferation of items, this approach can quickly become computationally infeasible. We address this challenge by developing a dual method that reduces the number of variables from n^2 to n, significantly improving the computational efficiency. We also extend this approach to the online setting, which is particularly important for big promotion events. Our empirical studies based on a real benchmark data with 100 millions of user visits from Tmall verify the effectiveness of the proposed approach.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122122582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信