{"title":"Workshop on Model Mining","authors":"Shan You, Chang Xu, Fei Wang, Changshui Zhang","doi":"10.1145/3447548.3469471","DOIUrl":"https://doi.org/10.1145/3447548.3469471","url":null,"abstract":"How to mine the knowledge in the pretrained models is of significance in achieving more promising performance, since practitioners have access to many pretrained models easily. This Workshop on Model Mining aims to investigate more diverse and advanced manners in mining knowledge within models, which tends to leverage the pretrained models more wisely, elegantly and systematically. There are many topics related to this workshop, such as distilling a lightweight model from a well-trained heavy model via teacher-student paradigm, and boosting the performance of the model by carefully designing the predecessor tasks, e.g., pre-training, self-supervised and contrastive learning. Model mining as a special way of data mining is relevant to SIGKDD, and its audience including researchers and engineers will benefit a lot for designing more advanced algorithms for their tasks.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"249 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115233739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Curriculum Meta-Learning for Next POI Recommendation","authors":"Yudong Chen, Xin Wang, M. Fan, Jizhou Huang, Shengwen Yang, Wenwu Zhu","doi":"10.1145/3447548.3467132","DOIUrl":"https://doi.org/10.1145/3447548.3467132","url":null,"abstract":"Next point-of-interest (POI) recommendation is a hot research field where a recent emerging scenario, next POI to search recommendation, has been deployed in many online map services such as Baidu Maps. One of the key issues in this scenario is providing satisfactory recommendation services for cold-start cities with a limited number of user-POI interactions, which requires transferring the knowledge hidden in rich data from many other cities to these cold-start cities. Existing literature either does not consider the city-transfer issue or cannot simultaneously tackle the data sparsity and pattern diversity issues among various users in multiple cities. To address these issues, we explore city-transfer next POI to search recommendation that transfers the knowledge from multiple cities with rich data to cold-start cities with scarce data. We propose a novel Curriculum Hardness Aware Meta-Learning (CHAML) framework, which incorporates hard sample mining and curriculum learning into a meta-learning paradigm. Concretely, the CHAML framework considers both city-level and user-level hardness to enhance the conditional sampling during meta training, and uses an easy-to-hard curriculum for the city-sampling pool to help the meta-learner converge to a better state. Extensive experiments on two real-world map search datasets from Baidu Maps demonstrate the superiority of CHAML framework.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127173937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Addressing Non-Representative Surveys using Multiple Instance Learning","authors":"Yaniv Katz, O. Vainas","doi":"10.1145/3447548.3467109","DOIUrl":"https://doi.org/10.1145/3447548.3467109","url":null,"abstract":"In recent years, non representative survey sampling and non response bias constitute major obstacles in obtaining a reliable population quantity estimate from finite survey samples. As such, researchers have been focusing on identifying methods to resolve these biases. In this paper, we look at this well known problem from a fresh perspective, and formulate it as a learning problem. To meet this challenge, we suggest solving the learning problem using a multiple instance learning (MIL) paradigm. We devise two different MIL based neural network topologies, each based on a different implementation of an attention pooling layer. These models are trained to accurately infer the population quantity of interest even when facing a biased sample. To the best of our knowledge, this is the first time MIL has ever been suggested as a solution to this problem. In contrast to commonly used statistical methods, this approach can be accomplished without having to collect sensitive personal data of the respondents and without having to access population level statistics of the same sensitive data. To validate the effectiveness of our approaches, we test them on a real-world movie rating dataset which is used to mimic a biased survey by experimentally contaminating it with different kinds of survey bias. We show that our suggested topologies outperform other MIL architectures, and are able to partly counter the adverse effect of biased sampling on the estimation quality. We also demonstrate how these methods can be easily adapted to perform well even when part of the survey is based on a small number of respondents.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127212991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Differentiable Pattern Set Mining","authors":"Jonas Fischer, Jilles Vreeken","doi":"10.1145/3447548.3467348","DOIUrl":"https://doi.org/10.1145/3447548.3467348","url":null,"abstract":"Pattern set mining has been successful in discovering small sets of highly informative and useful patterns from data. To find good models, existing methods heuristically explore the twice-exponential search space over all possible pattern sets in a combinatorial way, by which they are limited to data over at most hundreds of features, as well as likely to get stuck in local minima. Here, we propose a gradient based optimization approach that allows us to efficiently discover high-quality pattern sets from data of millions of rows and hundreds of thousands of features. In particular, we propose a novel type of neural autoencoder called BinaPs, using binary activations and binarizing weights in each forward pass, which are directly interpretable as conjunctive patterns. For training, optimizing a data-sparsity aware reconstruction loss, continuous versions of the weights are learned in small, noisy steps. This formulation provides a link between the discrete search space and continuous optimization, thus allowing for a gradient based strategy to discover sets of high-quality and noise-robust patterns. Through extensive experiments on both synthetic and real world data, we show that BinaPs discovers high quality and noise robust patterns, and unique among all competitors, easily scales to data of supermarket transactions or biological variant calls.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124934347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Subhabrata Mukherjee, Qi Li, Sihong Xie, Philip S. Yu, Jing Gao
{"title":"The Third International TrueFact Workshop: Making a Credible Web for Tomorrow","authors":"Subhabrata Mukherjee, Qi Li, Sihong Xie, Philip S. Yu, Jing Gao","doi":"10.1145/3447548.3469467","DOIUrl":"https://doi.org/10.1145/3447548.3469467","url":null,"abstract":"The Third International TrueFact Workshop: Making a Credible Web for Tomorrow is geared towards bringing academic, industry and government researchers and practitioners together to tackle the challenges in misinformation, data quality, truth finding, fact-checking, credibility analysis and rumor detection -- in heterogeneous and multi-modal sources of information including texts, images, videos, relational data, social networks and knowledge graphs.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125940136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Hofleitner, Meng Jiang, Srijan Kumar, Neil Shah, Kai Shu
{"title":"The Second International MIS2 Workshop: Misinformation and Misbehavior Mining on the Web","authors":"A. Hofleitner, Meng Jiang, Srijan Kumar, Neil Shah, Kai Shu","doi":"10.1145/3447548.3469443","DOIUrl":"https://doi.org/10.1145/3447548.3469443","url":null,"abstract":"Misinformation and misbehavior mining on the web (MIS2) workshop is held virtually on August 14, 2021 and is co-located with the ACM SIGKDD 2021 conference. The web has become a breeding ground for misbehavior and misinformation. It is timely and crucial to understand, detect, forecast, and mitigate their harm. MIS2 workshop as an interdisciplinary venue for researchers and practitioners who study the dark side of the web. The workshop program includes a peer-reviewed set of paper presentations and keynote talks, giving the attendees an immersive experience of this research field.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"538 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123245811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"KDD 2021 Tutorial on Systemic Challenges and Solutions on Bias and Unfairness in Peer Review","authors":"Nihar B. Shah","doi":"10.1145/3447548.3470826","DOIUrl":"https://doi.org/10.1145/3447548.3470826","url":null,"abstract":"Introduction. Peer review is a cornerstone of academic practice [1]. The peer review process is highly regarded by the vast majority of researchers and considered by most to be essential to the communication of scholarly research [2–4]. However, there is also an overwhelming desire for improvement [2, 4, 5]. Problems in peer review have consequences much beyond the outcome for a specific paper or grant, particularly due to the widespread prevalence of the Matthew effect (“rich get richer”) in academia [6]. As noted by [7] “an incompetent review may lead to the rejection of the submitted paper, or of the grant application, and the ultimate failure of the career of the author.” (See also [8, 9].) The importance of peer review and the urgent need for improvements, behooves research on principled approaches towards addressing problems in peer review, particularly at scale. In this tutorial, we discuss a number of key challenges in peer review, outline several directions of research on this topic, and also highlight important open problems that we envisage to be exciting to the community. This document summarizes the contents of the tutorial and provides relevant references.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123701843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhen Zhang, Jiajun Bu, M. Ester, Z. Li, Chengwei Yao, Zhi Yu, Can Wang
{"title":"H2MN","authors":"Zhen Zhang, Jiajun Bu, M. Ester, Z. Li, Chengwei Yao, Zhi Yu, Can Wang","doi":"10.1145/3447548.3467328","DOIUrl":"https://doi.org/10.1145/3447548.3467328","url":null,"abstract":"Graph similarity learning, which measures the similarities between a pair of graph-structured objects, lies at the core of various machine learning tasks such as graph classification, similarity search, etc. In this paper, we devise a novel graph neural network based framework to address this challenging problem, motivated by its great success in graph representation learning. As the vast majority of existing graph neural network models mainly concentrate on learning effective node or graph level representations of a single graph, little effort has been made to jointly reason over a pair of graph-structured inputs for graph similarity learning. To this end, we propose Hierarchical Hypergraph Matching Networks (H2sup>MN) to calculate the similarities between graph pairs with arbitrary structure. Specifically, our proposed H2MN learns graph representation from the perspective of hypergraph, and takes each hyperedge as a subgraph to perform subgraph matching, which could capture the rich substructure similarities across the graph. To enable hierarchical graph representation and fast similarity computation, we further propose a hyperedge pooling operator to transform each graph into a coarse graph of reduced size. Then, a multi-perspective cross-graph matching layer is employed on the coarsened graph pairs to extract the inter-graph similarity. Comprehensive experiments on five public datasets empirically demonstrate that our proposed model can outperform state-of-the-art baselines with different gains for graph-graph classification and regression tasks.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125384072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Domain-oriented Language Modeling with Adaptive Hybrid Masking and Optimal Transport Alignment","authors":"Denghui Zhang, Zixuan Yuan, Yanchi Liu, Hao Liu, Fuzhen Zhuang, Hui Xiong, Haifeng Chen","doi":"10.1145/3447548.3467215","DOIUrl":"https://doi.org/10.1145/3447548.3467215","url":null,"abstract":"Motivated by the success of pre-trained language models such as BERT in a broad range of natural language processing (NLP) tasks, recent research efforts have been made for adapting these models for different application domains. Along this line, existing domain-oriented models have primarily followed the vanilla BERT architecture and have a straightforward use of the domain corpus. However, domain-oriented tasks usually require accurate understanding of domain phrases, and such fine-grained phrase-level knowledge is hard to be captured by existing pre-training scheme. Also, the word co-occurrences guided semantic learning of pre-training models can be largely augmented by entity-level association knowledge. But meanwhile, there is a risk of introducing noise due to the lack of groundtruth word-level alignment. To address the issues, we provide a generalized domain-oriented approach, which leverages auxiliary domain knowledge to improve the existing pre-training framework from two aspects. First, to preserve phrase knowledge effectively, we build a domain phrase pool as auxiliary knowledge, meanwhile we introduce Adaptive Hybrid Masked Model to incorporate such knowledge. It integrates two learning modes, word learning and phrase learning, and allows them to switch between each other. Second, we introduce Cross Entity Alignment to leverage entity association as weak supervision to augment the semantic learning of pre-trained models. To alleviate the potential noise in this process, we introduce an interpretableOptimal Transport based approach to guide alignment learning. Experiments on four domain-oriented tasks demonstrate the superiority of our framework.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114919299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Incremental Computation of Aggregations over Sliding Windows","authors":"Chao Zhang, Reza Akbarinia, F. Toumani","doi":"10.1145/3447548.3467360","DOIUrl":"https://doi.org/10.1145/3447548.3467360","url":null,"abstract":"Computing aggregation over sliding windows, i.e., finite subsets of an unbounded stream, is a core operation in streaming analytics. We propose PBA (Parallel Boundary Aggregator), a novel parallel algorithm that groups continuous slices of streaming values into chunks and exploits two buffers, cumulative slice aggregations and left cumulative slice aggregations, to compute sliding window aggregations efficiently. PBA runs in O(1) time, performing at most 3 merging operations per slide while consuming O(n) space for windows with n partial aggregations. Our empirical experiments demonstrate that PBA can improve throughput up to 4X while reducing latency, compared to state-of-the-art algorithms.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116030744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}