Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining最新文献_第6页

Fake News Research: Theories, Detection Strategies, and Open Problems 假新闻研究:理论、检测策略和开放性问题

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2019-07-25 DOI: 10.1145/3292500.3332287

R. Zafarani, Xinyi Zhou, Kai Shu, Huan Liu

{"title":"Fake News Research: Theories, Detection Strategies, and Open Problems","authors":"R. Zafarani, Xinyi Zhou, Kai Shu, Huan Liu","doi":"10.1145/3292500.3332287","DOIUrl":"https://doi.org/10.1145/3292500.3332287","url":null,"abstract":"Fake news has become a global phenomenon due its explosive growth, particularly on social media. The goal of this tutorial is to (1) clearly introduce the concept and characteristics of fake news and how it can be formally differentiated from other similar concepts such as mis-/dis-information, satire news, rumors, among others, which helps deepen the understanding of fake news; (2) provide a comprehensive review of fundamental theories across disciplines and illustrate how they can be used to conduct interdisciplinary fake news research, facilitating a concerted effort of experts in computer and information science, political science, journalism, social science, psychology and economics. Such concerted efforts can result in highly efficient and explainable fake news detection; (3) systematically present fake news detection strategies from four perspectives (i.e., knowledge, style, propagation, and credibility) and the ways that each perspective utilizes techniques developed in data/graph mining, machine learning, natural language processing, and information retrieval; and (4) detail open issues within current fake news studies to reveal great potential research opportunities, hoping to attract researchers within a broader area to work on fake news detection and further facilitate its development. The tutorial aims to promote a fair, healthy and safe online information and news dissemination ecosystem, hoping to attract more researchers, engineers and students with various interests to fake news research. Few prerequisite are required for KDD participants to attend.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121472196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 52

State-Sharing Sparse Hidden Markov Models for Personalized Sequences 个性化序列的状态共享稀疏隐马尔可夫模型

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2019-07-25 DOI: 10.1145/3292500.3330828

Hongzhi Shi, Chao Zhang, Quanming Yao, Yong Li, Funing Sun, Depeng Jin

{"title":"State-Sharing Sparse Hidden Markov Models for Personalized Sequences","authors":"Hongzhi Shi, Chao Zhang, Quanming Yao, Yong Li, Funing Sun, Depeng Jin","doi":"10.1145/3292500.3330828","DOIUrl":"https://doi.org/10.1145/3292500.3330828","url":null,"abstract":"Hidden Markov Model (HMM) is a powerful tool that has been widely adopted in sequence modeling tasks, such as mobility analysis, healthcare informatics, and online recommendation. However, using HMM for modeling personalized sequences remains a challenging problem: training a unified HMM with all the sequences often fails to uncover interesting personalized patterns; yet training one HMM for each individual inevitably suffers from data scarcity. We address this challenge by proposing a state-sharing sparse hidden Markov model (S3HMM) that can uncover personalized sequential patterns without suffering from data scarcity. This is achieved by two design principles: (1) all the HMMs in the ensemble share the same set of latent states; and (2) each HMM has its own transition matrix to model the personalized transitions. The result optimization problem for S3HMM becomes nontrivial, because of its two-layer hidden state design and the non-convexity in parameter estimation. We design a new Expectation-Maximization algorithm based, which treats the difference of convex programming as a sub-solver to optimize the non-convex function in the M-step with convergence guarantee. Our experimental results show that, S3HMM can successfully uncover personalized sequential patterns in various applications and outperforms baselines significantly in downstream prediction tasks.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114159633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Buying or Browsing?: Predicting Real-time Purchasing Intent using Attention-based Deep Network with Multiple Behavior 购买还是浏览?基于多重行为的基于注意力的深度网络预测实时购买意图

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2019-07-25 DOI: 10.1145/3292500.3330670

Long Guo, Lifeng Hua, Rongfei Jia, Binqiang Zhao, Xiaobo Wang, B. Cui

{"title":"Buying or Browsing?: Predicting Real-time Purchasing Intent using Attention-based Deep Network with Multiple Behavior","authors":"Long Guo, Lifeng Hua, Rongfei Jia, Binqiang Zhao, Xiaobo Wang, B. Cui","doi":"10.1145/3292500.3330670","DOIUrl":"https://doi.org/10.1145/3292500.3330670","url":null,"abstract":"E-commerce platforms are becoming a primary place for people to find, compare and ultimately purchase products. One of the fundamental questions that arises in e-commerce is to predict user purchasing intent, which is an important part of user understanding and allows for providing better services for both sellers and customers. However, previous work cannot predict real-time user purchasing intent with a high accuracy, limited by the representation capability of traditional browse-interactive behavior adopted. In this paper, we propose a novel end-to-end deep network, named Deep Intent Prediction Network (DIPN), to predict real-time user purchasing intent. In particular, besides the traditional browse-interactive behavior, we collect a new type of user interactive behavior, called touch-interactive behavior, which can capture more fine-grained real-time user features. To combine these behavior effectively, we propose a hierarchical attention mechanism, where the bottom attention layer focuses on the inner parts of each behavior sequence while the top attention layer learns the inter-view relations between different behavior sequences. In addition, we propose to train DIPN with multi-task learning to better distinguish user behavior patterns. In the experiments conducted on a large-scale industrial dataset, DIPN significantly outperforms the baseline solutions. Notably, DIPN gains about 18.96% improvement on AUC than the state-of-the-art solution only using traditional browse-interactive behavior sequences. Moreover, DIPN has been deployed in the operational system of Taobao. Online A/B testing results with more than 12.9 millions of users reveal the potential of knowing users' real-time purchasing intent.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121673662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 67

Learning Sleep Quality from Daily Logs 从日常日志中学习睡眠质量

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2019-07-25 DOI: 10.1145/3292500.3330792

S. Park, Cheng-te Li, Sungwon Han, Cheng-Mao Hsu, Sang Won Lee, M. Cha

引用次数: 11

Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network 基于随机递归神经网络的多元时间序列鲁棒异常检测

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2019-07-25 DOI: 10.1145/3292500.3330672

Ya Su, Youjian Zhao, Chenhao Niu, Rong Liu, Wei Sun, Dan Pei

{"title":"Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network","authors":"Ya Su, Youjian Zhao, Chenhao Niu, Rong Liu, Wei Sun, Dan Pei","doi":"10.1145/3292500.3330672","DOIUrl":"https://doi.org/10.1145/3292500.3330672","url":null,"abstract":"Industry devices (i.e., entities) such as server machines, spacecrafts, engines, etc., are typically monitored with multivariate time series, whose anomaly detection is critical for an entity's service quality management. However, due to the complex temporal dependence and stochasticity of multivariate time series, their anomaly detection remains a big challenge. This paper proposes OmniAnomaly, a stochastic recurrent neural network for multivariate time series anomaly detection that works well robustly for various devices. Its core idea is to capture the normal patterns of multivariate time series by learning their robust representations with key techniques such as stochastic variable connection and planar normalizing flow, reconstruct input data by the representations, and use the reconstruction probabilities to determine anomalies. Moreover, for a detected entity anomaly, OmniAnomaly can provide interpretations based on the reconstruction probabilities of its constituent univariate time series. The evaluation experiments are conducted on two public datasets from aerospace and a new server machine dataset (collected and released by us) from an Internet company. OmniAnomaly achieves an overall F1-Score of 0.86 in three real-world datasets, signicantly outperforming the best performing baseline method by 0.09. The interpretation accuracy for OmniAnomaly is up to 0.89.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123859958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 584

Deep Bayesian Mining, Learning and Understanding 深度贝叶斯挖掘，学习和理解

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2019-07-25 DOI: 10.1145/3292500.3332267

Jen-Tzung Chien

{"title":"Deep Bayesian Mining, Learning and Understanding","authors":"Jen-Tzung Chien","doi":"10.1145/3292500.3332267","DOIUrl":"https://doi.org/10.1145/3292500.3332267","url":null,"abstract":"This tutorial addresses the advances in deep Bayesian mining and learning for natural language with ubiquitous applications ranging from speech recognition to document summarization, text classification, text segmentation, information extraction, image caption generation, sentence generation, dialogue control, sentiment classification, recommendation system, question answering and machine translation, to name a few. Traditionally, \"deep learning\" is taken to be a learning process where the inference or optimization is based on the real-valued deterministic model. The \"semantic structure\" in words, sentences, entities, actions and documents drawn from a large vocabulary may not be well expressed or correctly optimized in mathematical logic or computer programs. The \"distribution function\" in discrete or continuous latent variable model for natural language may not be properly decomposed or estimated. This tutorial addresses the fundamentals of statistical models and neural networks, and focus on a series of advanced Bayesian models and deep models including hierarchical Dirichlet process, Chinese restaurant process, hierarchical Pitman-Yor process, Indian buffet process, recurrent neural network, long short-term memory, sequence-to-sequence model, variational auto-encoder, generative adversarial network, attention mechanism, memory-augmented neural network, skip neural network, stochastic neural network, predictive state neural network, policy neural network. We present how these models are connected and why they work for a variety of applications on symbolic and complex patterns in natural language. The variational inference and sampling method are formulated to tackle the optimization for complicated models. The word and sentence embeddings, clustering and co-clustering are merged with linguistic and semantic constraints. A series of case studies are presented to tackle different issues in deep Bayesian mining, learning and understanding. At last, we will point out a number of directions and outlooks for future studies.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124934251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Revisiting kd-tree for Nearest Neighbor Search 重访kd树进行最近邻搜索

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2019-07-25 DOI: 10.1145/3292500.3330875

P. Ram, Kaushik Sinha

引用次数: 59

Contextual Fact Ranking and Its Applications in Table Synthesis and Compression 上下文事实排序及其在表合成和压缩中的应用

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2019-07-25 DOI: 10.1145/3292500.3330980

Silu Huang, Jialu Liu, Flip Korn, Xuezhi Wang, You Wu, Dale Markowitz, Cong Yu

{"title":"Contextual Fact Ranking and Its Applications in Table Synthesis and Compression","authors":"Silu Huang, Jialu Liu, Flip Korn, Xuezhi Wang, You Wu, Dale Markowitz, Cong Yu","doi":"10.1145/3292500.3330980","DOIUrl":"https://doi.org/10.1145/3292500.3330980","url":null,"abstract":"Modern search engines increasingly incorporate tabular content, which consists of a set of entities each augmented with a small set of facts. The facts can be obtained from multiple sources: an entity's knowledge base entry, the infobox on its Wikipedia page, or its row within a WebTable. Crucially, the informativeness of a fact depends not only on the entity but also the specific context(e.g., the query).To the best of our knowledge, this paper is the first to study the problem of contextual fact ranking: given some entities and a context (i.e., succinct natural language description), identify the most informative facts for the entities collectively within the context.We propose to contextually rank the facts by exploiting deep learning techniques. In particular, we develop pointwise and pair-wise ranking models, using textual and statistical information for the given entities and context derived from their sources. We enhance the models by incorporating entity type information from an IsA (hypernym) database. We demonstrate that our approaches achieve better performance than state-of-the-art baselines in terms of MAP, NDCG, and recall. We further conduct user studies for two specific applications of contextual fact ranking-table synthesis and table compression-and show that our models can identify more informative facts than the baselines.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122797703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

MVAN: Multi-view Attention Networks for Real Money Trading Detection in Online Games 基于多视角注意力网络的在线游戏真钱交易检测

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2019-07-25 DOI: 10.1145/3292500.3330687

Jianrong Tao, Jianshi Lin, Shize Zhang, Sha Zhao, Runze Wu, Changjie Fan, Peng Cui

{"title":"MVAN: Multi-view Attention Networks for Real Money Trading Detection in Online Games","authors":"Jianrong Tao, Jianshi Lin, Shize Zhang, Sha Zhao, Runze Wu, Changjie Fan, Peng Cui","doi":"10.1145/3292500.3330687","DOIUrl":"https://doi.org/10.1145/3292500.3330687","url":null,"abstract":"Online gaming is a multi-billion dollar industry that entertains a large, global population. However, one unfortunate phenomenon known as real money trading harms the competition and the fun. Real money trading is an interesting economic activity used to exchange assets in a virtual world with real world currencies, leading to imbalance of game economy and inequality of wealth and opportunity. Game operation teams have been devoting much efforts on real money trading detection, however, it still remains a challenging task. To overcome the limitation from traditional methods conducted by game operation teams, we propose, MVAN, the first multi-view attention networks for detecting real money trading with multi-view data sources. We present a multi-graph attention network (MGAT) in the graph structure view, a behavior attention network (BAN) in the vertex content view, a portrait attention network (PAN) in the vertex attribute view and a data source attention network (DSAN) in the data source view. Experiments conducted on real-world game logs from a commercial NetEase MMORPG( JusticePC) show that our method consistently performs promising results compared with other competitive methods over time and verifiy the importance and rationality of attention mechanisms. MVAN is deployed to several MMORPGs in NetEase in practice and achieving remarkable performance improvement and acceleration. Our method can easily generalize to other types of related tasks in real world, such as fraud detection, drug tracking and money laundering tracking etc.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122908984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Tensorized Determinantal Point Processes for Recommendation 用于推荐的张张化行列式点过程

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2019-07-25 DOI: 10.1145/3292500.3330952

Romain Warlop, Jérémie Mary, Mike Gartrell

引用次数: 18