Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining最新文献

Workshop on Model Mining 模型采矿工作坊

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining Pub Date : 2021-08-14 DOI: 10.1145/3447548.3469471

Shan You, Chang Xu, Fei Wang, Changshui Zhang

引用次数: 1

Curriculum Meta-Learning for Next POI Recommendation 下一个POI建议的课程元学习

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining Pub Date : 2021-08-14 DOI: 10.1145/3447548.3467132

Yudong Chen, Xin Wang, M. Fan, Jizhou Huang, Shengwen Yang, Wenwu Zhu

{"title":"Curriculum Meta-Learning for Next POI Recommendation","authors":"Yudong Chen, Xin Wang, M. Fan, Jizhou Huang, Shengwen Yang, Wenwu Zhu","doi":"10.1145/3447548.3467132","DOIUrl":"https://doi.org/10.1145/3447548.3467132","url":null,"abstract":"Next point-of-interest (POI) recommendation is a hot research field where a recent emerging scenario, next POI to search recommendation, has been deployed in many online map services such as Baidu Maps. One of the key issues in this scenario is providing satisfactory recommendation services for cold-start cities with a limited number of user-POI interactions, which requires transferring the knowledge hidden in rich data from many other cities to these cold-start cities. Existing literature either does not consider the city-transfer issue or cannot simultaneously tackle the data sparsity and pattern diversity issues among various users in multiple cities. To address these issues, we explore city-transfer next POI to search recommendation that transfers the knowledge from multiple cities with rich data to cold-start cities with scarce data. We propose a novel Curriculum Hardness Aware Meta-Learning (CHAML) framework, which incorporates hard sample mining and curriculum learning into a meta-learning paradigm. Concretely, the CHAML framework considers both city-level and user-level hardness to enhance the conditional sampling during meta training, and uses an easy-to-hard curriculum for the city-sampling pool to help the meta-learner converge to a better state. Extensive experiments on two real-world map search datasets from Baidu Maps demonstrate the superiority of CHAML framework.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127173937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 45

Addressing Non-Representative Surveys using Multiple Instance Learning 使用多实例学习解决非代表性调查

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining Pub Date : 2021-08-14 DOI: 10.1145/3447548.3467109

Yaniv Katz, O. Vainas

{"title":"Addressing Non-Representative Surveys using Multiple Instance Learning","authors":"Yaniv Katz, O. Vainas","doi":"10.1145/3447548.3467109","DOIUrl":"https://doi.org/10.1145/3447548.3467109","url":null,"abstract":"In recent years, non representative survey sampling and non response bias constitute major obstacles in obtaining a reliable population quantity estimate from finite survey samples. As such, researchers have been focusing on identifying methods to resolve these biases. In this paper, we look at this well known problem from a fresh perspective, and formulate it as a learning problem. To meet this challenge, we suggest solving the learning problem using a multiple instance learning (MIL) paradigm. We devise two different MIL based neural network topologies, each based on a different implementation of an attention pooling layer. These models are trained to accurately infer the population quantity of interest even when facing a biased sample. To the best of our knowledge, this is the first time MIL has ever been suggested as a solution to this problem. In contrast to commonly used statistical methods, this approach can be accomplished without having to collect sensitive personal data of the respondents and without having to access population level statistics of the same sensitive data. To validate the effectiveness of our approaches, we test them on a real-world movie rating dataset which is used to mimic a biased survey by experimentally contaminating it with different kinds of survey bias. We show that our suggested topologies outperform other MIL architectures, and are able to partly counter the adverse effect of biased sampling on the estimation quality. We also demonstrate how these methods can be easily adapted to perform well even when part of the survey is based on a small number of respondents.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127212991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Differentiable Pattern Set Mining 可微模式集挖掘

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining Pub Date : 2021-08-14 DOI: 10.1145/3447548.3467348

Jonas Fischer, Jilles Vreeken

{"title":"Differentiable Pattern Set Mining","authors":"Jonas Fischer, Jilles Vreeken","doi":"10.1145/3447548.3467348","DOIUrl":"https://doi.org/10.1145/3447548.3467348","url":null,"abstract":"Pattern set mining has been successful in discovering small sets of highly informative and useful patterns from data. To find good models, existing methods heuristically explore the twice-exponential search space over all possible pattern sets in a combinatorial way, by which they are limited to data over at most hundreds of features, as well as likely to get stuck in local minima. Here, we propose a gradient based optimization approach that allows us to efficiently discover high-quality pattern sets from data of millions of rows and hundreds of thousands of features. In particular, we propose a novel type of neural autoencoder called BinaPs, using binary activations and binarizing weights in each forward pass, which are directly interpretable as conjunctive patterns. For training, optimizing a data-sparsity aware reconstruction loss, continuous versions of the weights are learned in small, noisy steps. This formulation provides a link between the discrete search space and continuous optimization, thus allowing for a gradient based strategy to discover sets of high-quality and noise-robust patterns. Through extensive experiments on both synthetic and real world data, we show that BinaPs discovers high quality and noise robust patterns, and unique among all competitors, easily scales to data of supermarket transactions or biological variant calls.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124934347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

The Third International TrueFact Workshop: Making a Credible Web for Tomorrow 第三届国际真相研讨会:为明天打造一个可信的网络

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining Pub Date : 2021-08-14 DOI: 10.1145/3447548.3469467

Subhabrata Mukherjee, Qi Li, Sihong Xie, Philip S. Yu, Jing Gao

引用次数: 0

The Second International MIS2 Workshop: Misinformation and Misbehavior Mining on the Web 第二届国际MIS2研讨会:网络上的错误信息和不当行为挖掘

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining Pub Date : 2021-08-14 DOI: 10.1145/3447548.3469443

A. Hofleitner, Meng Jiang, Srijan Kumar, Neil Shah, Kai Shu

引用次数: 0

KDD 2021 Tutorial on Systemic Challenges and Solutions on Bias and Unfairness in Peer Review 关于同行评议中偏见和不公平的系统性挑战和解决方案的KDD 2021教程

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining Pub Date : 2021-08-14 DOI: 10.1145/3447548.3470826

Nihar B. Shah

{"title":"KDD 2021 Tutorial on Systemic Challenges and Solutions on Bias and Unfairness in Peer Review","authors":"Nihar B. Shah","doi":"10.1145/3447548.3470826","DOIUrl":"https://doi.org/10.1145/3447548.3470826","url":null,"abstract":"Introduction. Peer review is a cornerstone of academic practice [1]. The peer review process is highly regarded by the vast majority of researchers and considered by most to be essential to the communication of scholarly research [2–4]. However, there is also an overwhelming desire for improvement [2, 4, 5]. Problems in peer review have consequences much beyond the outcome for a specific paper or grant, particularly due to the widespread prevalence of the Matthew effect (“rich get richer”) in academia [6]. As noted by [7] “an incompetent review may lead to the rejection of the submitted paper, or of the grant application, and the ultimate failure of the career of the author.” (See also [8, 9].) The importance of peer review and the urgent need for improvements, behooves research on principled approaches towards addressing problems in peer review, particularly at scale. In this tutorial, we discuss a number of key challenges in peer review, outline several directions of research on this topic, and also highlight important open problems that we envisage to be exciting to the community. This document summarizes the contents of the tutorial and provides relevant references.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123701843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

H2MN H2MN

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining Pub Date : 2021-08-14 DOI: 10.1145/3447548.3467328

Zhen Zhang, Jiajun Bu, M. Ester, Z. Li, Chengwei Yao, Zhi Yu, Can Wang

{"title":"H2MN","authors":"Zhen Zhang, Jiajun Bu, M. Ester, Z. Li, Chengwei Yao, Zhi Yu, Can Wang","doi":"10.1145/3447548.3467328","DOIUrl":"https://doi.org/10.1145/3447548.3467328","url":null,"abstract":"Graph similarity learning, which measures the similarities between a pair of graph-structured objects, lies at the core of various machine learning tasks such as graph classification, similarity search, etc. In this paper, we devise a novel graph neural network based framework to address this challenging problem, motivated by its great success in graph representation learning. As the vast majority of existing graph neural network models mainly concentrate on learning effective node or graph level representations of a single graph, little effort has been made to jointly reason over a pair of graph-structured inputs for graph similarity learning. To this end, we propose Hierarchical Hypergraph Matching Networks (H2sup>MN) to calculate the similarities between graph pairs with arbitrary structure. Specifically, our proposed H2MN learns graph representation from the perspective of hypergraph, and takes each hyperedge as a subgraph to perform subgraph matching, which could capture the rich substructure similarities across the graph. To enable hierarchical graph representation and fast similarity computation, we further propose a hyperedge pooling operator to transform each graph into a coarse graph of reduced size. Then, a multi-perspective cross-graph matching layer is employed on the coarsened graph pairs to extract the inter-graph similarity. Comprehensive experiments on five public datasets empirically demonstrate that our proposed model can outperform state-of-the-art baselines with different gains for graph-graph classification and regression tasks.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125384072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Domain-oriented Language Modeling with Adaptive Hybrid Masking and Optimal Transport Alignment 基于自适应混合掩蔽和最优传输对齐的面向领域语言建模

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining Pub Date : 2021-08-14 DOI: 10.1145/3447548.3467215

Denghui Zhang, Zixuan Yuan, Yanchi Liu, Hao Liu, Fuzhen Zhuang, Hui Xiong, Haifeng Chen

{"title":"Domain-oriented Language Modeling with Adaptive Hybrid Masking and Optimal Transport Alignment","authors":"Denghui Zhang, Zixuan Yuan, Yanchi Liu, Hao Liu, Fuzhen Zhuang, Hui Xiong, Haifeng Chen","doi":"10.1145/3447548.3467215","DOIUrl":"https://doi.org/10.1145/3447548.3467215","url":null,"abstract":"Motivated by the success of pre-trained language models such as BERT in a broad range of natural language processing (NLP) tasks, recent research efforts have been made for adapting these models for different application domains. Along this line, existing domain-oriented models have primarily followed the vanilla BERT architecture and have a straightforward use of the domain corpus. However, domain-oriented tasks usually require accurate understanding of domain phrases, and such fine-grained phrase-level knowledge is hard to be captured by existing pre-training scheme. Also, the word co-occurrences guided semantic learning of pre-training models can be largely augmented by entity-level association knowledge. But meanwhile, there is a risk of introducing noise due to the lack of groundtruth word-level alignment. To address the issues, we provide a generalized domain-oriented approach, which leverages auxiliary domain knowledge to improve the existing pre-training framework from two aspects. First, to preserve phrase knowledge effectively, we build a domain phrase pool as auxiliary knowledge, meanwhile we introduce Adaptive Hybrid Masked Model to incorporate such knowledge. It integrates two learning modes, word learning and phrase learning, and allows them to switch between each other. Second, we introduce Cross Entity Alignment to leverage entity association as weak supervision to augment the semantic learning of pre-trained models. To alleviate the potential noise in this process, we introduce an interpretableOptimal Transport based approach to guide alignment learning. Experiments on four domain-oriented tasks demonstrate the superiority of our framework.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114919299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Efficient Incremental Computation of Aggregations over Sliding Windows 滑动窗口上聚合的高效增量计算

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining Pub Date : 2021-08-14 DOI: 10.1145/3447548.3467360

Chao Zhang, Reza Akbarinia, F. Toumani

引用次数: 7