Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining最新文献

Discovering Non-Redundant K-means Clusterings in Optimal Subspaces 在最优子空间中发现非冗余K-means聚类

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2018-07-19 DOI: 10.1145/3219819.3219945

Dominik Mautz, Wei Ye, C. Plant, C. Böhm

引用次数: 24

Adaptive Paywall Mechanism for Digital News Media 数字新闻媒体的自适应付费墙机制

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2018-07-19 DOI: 10.1145/3219819.3219892

Heydar Davoudi, Aijun An, Morteza Zihayat, Gordon Edall

{"title":"Adaptive Paywall Mechanism for Digital News Media","authors":"Heydar Davoudi, Aijun An, Morteza Zihayat, Gordon Edall","doi":"10.1145/3219819.3219892","DOIUrl":"https://doi.org/10.1145/3219819.3219892","url":null,"abstract":"Many online news agencies utilize the paywall mechanism to increase reader subscriptions. This method offers a non-subscribed reader a fixed number of free articles in a period of time (e.g., a month), and then directs the user to the subscription page for further reading. We argue that there is no direct relationship between the number of paywalls presented to readers and the number of subscriptions, and that this artificial barrier, if not used well, may disengage potential subscribers and thus may not well serve its purpose of increasing revenue. Moreover, the current paywall mechanism neither considers the user browsing history nor the potential articles which the user may visit in the future. Thus, it treats all readers equally and does not consider the potential of a reader in becoming a subscriber. In this paper, we propose an adaptive paywall mechanism to balance the benefit of showing an article against that of displaying the paywall (i.e., terminating the session). We first define the notion of cost and utility that are used to define an objective function for optimal paywall decision making. Then, we model the problem as a stochastic sequential decision process. Finally, we propose an efficient policy function for paywall decision making. The experimental results on a real dataset from a major newspaper in Canada show that the proposed model outperforms the traditional paywall mechanism as well as the other baselines.","PeriodicalId":322066,"journal":{"name":"Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121316159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

On Interpretation of Network Embedding via Taxonomy Induction 基于分类归纳的网络嵌入解释

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2018-07-19 DOI: 10.1145/3219819.3220001

Ninghao Liu, Xiao Huang, Jundong Li, Xia Hu

{"title":"On Interpretation of Network Embedding via Taxonomy Induction","authors":"Ninghao Liu, Xiao Huang, Jundong Li, Xia Hu","doi":"10.1145/3219819.3220001","DOIUrl":"https://doi.org/10.1145/3219819.3220001","url":null,"abstract":"Network embedding has been increasingly used in many network analytics applications to generate low-dimensional vector representations, so that many off-the-shelf models can be applied to solve a wide variety of data mining tasks. However, similar to many other machine learning methods, network embedding results remain hard to be understood by users. Each dimension in the embedding space usually does not have any specific meaning, thus it is difficult to comprehend how the embedding instances are distributed in the reconstructed space. In addition, heterogeneous content information may be incorporated into network embedding, so it is challenging to specify which source of information is effective in generating the embedding results. In this paper, we investigate the interpretation of network embedding, aiming to understand how instances are distributed in embedding space, as well as explore the factors that lead to the embedding results. We resort to the post-hoc interpretation scheme, so that our approach can be applied to different types of embedding methods. Specifically, the interpretation of network embedding is presented in the form of a taxonomy. Effective objectives and corresponding algorithms are developed towards building the taxonomy. We also design several metrics to evaluate interpretation results. Experiments on real-world datasets from different domains demonstrate that, by comparing with the state-of-the-art alternatives, our approach produces effective and meaningful interpretation to embedding results.","PeriodicalId":322066,"journal":{"name":"Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125070001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 47

Learning Adversarial Networks for Semi-Supervised Text Classification via Policy Gradient 基于策略梯度学习半监督文本分类的对抗网络

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2018-07-19 DOI: 10.1145/3219819.3219956

Yan Li, Jieping Ye

{"title":"Learning Adversarial Networks for Semi-Supervised Text Classification via Policy Gradient","authors":"Yan Li, Jieping Ye","doi":"10.1145/3219819.3219956","DOIUrl":"https://doi.org/10.1145/3219819.3219956","url":null,"abstract":"Semi-supervised learning is a branch of machine learning techniques that aims to make fully use of both labeled and unlabeled instances to improve the prediction performance. The size of modern real world datasets is ever-growing so that acquiring label information for them is extraordinarily difficult and costly. Therefore, deep semi-supervised learning is becoming more and more popular. Most of the existing deep semi-supervised learning methods are built under the generative model based scheme, where the data distribution is approximated via input data reconstruction. However, this scheme does not naturally work on discrete data, e.g., text; in addition, learning a good data representation is sometimes directly opposed to the goal of learning a high performance prediction model. To address the issues of this type of methods, we reformulate the semi-supervised learning as a model-based reinforcement learning problem and propose an adversarial networks based framework. The proposed framework contains two networks: a predictor network for target estimation and a judge network for evaluation. The judge network iteratively generates proper reward to guide the training of predictor network, and the predictor network is trained via policy gradient. Based on the aforementioned framework, we propose a recurrent neural network based model for semi-supervised text classification. We conduct comprehensive experimental analysis on several real world benchmark text datasets, and the results from our evaluations show that our method outperforms other competing state-of-the-art methods.","PeriodicalId":322066,"journal":{"name":"Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123564743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 45

Count-Min: Optimal Estimation and Tight Error Bounds using Empirical Error Distributions Count-Min:使用经验误差分布的最优估计和紧误差范围

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2018-07-19 DOI: 10.1145/3219819.3219975

Daniel Ting

{"title":"Count-Min: Optimal Estimation and Tight Error Bounds using Empirical Error Distributions","authors":"Daniel Ting","doi":"10.1145/3219819.3219975","DOIUrl":"https://doi.org/10.1145/3219819.3219975","url":null,"abstract":"The Count-Min sketch is an important and well-studied data summarization method. It can estimate the count of any item in a stream using a small, fixed size data sketch. However, the accuracy of the Count-Min sketch depends on characteristics of the underlying data. This has led to a number of count estimation procedures which work well in one scenario but perform poorly in others. A practitioner is faced with two basic, unanswered questions. Given an estimate, what is its error? Which estimation procedure should be chosen when the data is unknown? We provide answers to these questions. We derive new count estimators, including a provably optimal estimator, which best or match previous estimators in all scenarios. We also provide practical, tight error bounds at query time for all estimators and methods to tune sketch parameters using these bounds. The key observation is that the full distribution of errors in each counter can be empirically estimated from the sketch itself. By first estimating this distribution, count estimation becomes a statistical estimation and inference problem with a known error distribution. This provides both a principled way to derive new and optimal estimators as well as a way to study the error and properties of existing estimators.","PeriodicalId":322066,"journal":{"name":"Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125518730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Fatigue Prediction in Outdoor Runners Via Machine Learning and Sensor Fusion 基于机器学习和传感器融合的户外跑步者疲劳预测

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2018-07-19 DOI: 10.1145/3219819.3219864

Tim Op De Beéck, Wannes Meert, K. Schütte, B. Vanwanseele, Jesse Davis

引用次数: 38

Finding Similar Exercises in Online Education Systems 在在线教育系统中发现类似的练习

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2018-07-19 DOI: 10.1145/3219819.3219960

Qi Liu, Zai Huang, Zhenya Huang, Chuanren Liu, Enhong Chen, Yu-Ho Su, Guoping Hu

{"title":"Finding Similar Exercises in Online Education Systems","authors":"Qi Liu, Zai Huang, Zhenya Huang, Chuanren Liu, Enhong Chen, Yu-Ho Su, Guoping Hu","doi":"10.1145/3219819.3219960","DOIUrl":"https://doi.org/10.1145/3219819.3219960","url":null,"abstract":"In online education systems, finding similar exercises is a fundamental task of many applications, such as exercise retrieval and student modeling. Several approaches have been proposed for this task by simply using the specific textual content (e.g. the same knowledge concepts or the similar words) in exercises. However, the problem of how to systematically exploit the rich semantic information embedded in multiple heterogenous data (e.g. texts and images) to precisely retrieve similar exercises remains pretty much open. To this end, in this paper, we develop a novel Multimodal Attention-based Neural Network (MANN) framework for finding similar exercises in large-scale online education systems by learning a unified semantic representation from the heterogenous data. In MANN, given exercises with texts, images and knowledge concepts, we first apply a convolutional neural network to extract image representations and use an embedding layer for representing concepts. Then, we design an attention-based long short-term memory network to learn a unified semantic representation of each exercise in a multimodal way. Here, two attention strategies are proposed to capture the associations of texts and images, texts and knowledge concepts, respectively. Moreover, with a Similarity Attention, the similar parts in each exercise pair are also measured. Finally, we develop a pairwise training strategy for returning similar exercises. Extensive experimental results on real-world data clearly validate the effectiveness and the interpretation power of MANN.","PeriodicalId":322066,"journal":{"name":"Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122917542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 73

Large-Scale Order Dispatch in On-Demand Ride-Hailing Platforms: A Learning and Planning Approach 网约车平台的大规模订单调度:一种学习与规划方法

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2018-07-19 DOI: 10.1145/3219819.3219824

Zhe Xu, Zhixin Li, Qingwen Guan, Dingshui Zhang, Qiang Li, Junxiao Nan, Chunyang Liu, Wei Bian, Jieping Ye

{"title":"Large-Scale Order Dispatch in On-Demand Ride-Hailing Platforms: A Learning and Planning Approach","authors":"Zhe Xu, Zhixin Li, Qingwen Guan, Dingshui Zhang, Qiang Li, Junxiao Nan, Chunyang Liu, Wei Bian, Jieping Ye","doi":"10.1145/3219819.3219824","DOIUrl":"https://doi.org/10.1145/3219819.3219824","url":null,"abstract":"We present a novel order dispatch algorithm in large-scale on-demand ride-hailing platforms. While traditional order dispatch approaches usually focus on immediate customer satisfaction, the proposed algorithm is designed to provide a more efficient way to optimize resource utilization and user experience in a global and more farsighted view. In particular, we model order dispatch as a large-scale sequential decision-making problem, where the decision of assigning an order to a driver is determined by a centralized algorithm in a coordinated way. The problem is solved in a learning and planning manner: 1) based on historical data, we first summarize demand and supply patterns into a spatiotemporal quantization, each of which indicates the expected value of a driver being in a particular state; 2) a planning step is conducted in real-time, where each driver-order-pair is valued in consideration of both immediate rewards and future gains, and then dispatch is solved using a combinatorial optimizing algorithm. Through extensive offline experiments and online AB tests, the proposed approach delivers remarkable improvement on the platform's efficiency and has been successfully deployed in the production system of Didi Chuxing.","PeriodicalId":322066,"journal":{"name":"Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123020702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 300

Efficient Attribute Recommendation with Probabilistic Guarantee 具有概率保证的高效属性推荐

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2018-07-19 DOI: 10.1145/3219819.3219984

Chi Wang, K. Chakrabarti

引用次数: 6

Computational Advertising at Scale 大规模计算广告

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2018-07-19 DOI: 10.1145/3219819.3219932

Suju Rajan

引用次数: 0