Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining最新文献

筛选
英文 中文
Workshop on Model Mining 模型采矿工作坊
Shan You, Chang Xu, Fei Wang, Changshui Zhang
{"title":"Workshop on Model Mining","authors":"Shan You, Chang Xu, Fei Wang, Changshui Zhang","doi":"10.1145/3447548.3469471","DOIUrl":"https://doi.org/10.1145/3447548.3469471","url":null,"abstract":"How to mine the knowledge in the pretrained models is of significance in achieving more promising performance, since practitioners have access to many pretrained models easily. This Workshop on Model Mining aims to investigate more diverse and advanced manners in mining knowledge within models, which tends to leverage the pretrained models more wisely, elegantly and systematically. There are many topics related to this workshop, such as distilling a lightweight model from a well-trained heavy model via teacher-student paradigm, and boosting the performance of the model by carefully designing the predecessor tasks, e.g., pre-training, self-supervised and contrastive learning. Model mining as a special way of data mining is relevant to SIGKDD, and its audience including researchers and engineers will benefit a lot for designing more advanced algorithms for their tasks.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"249 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115233739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Curriculum Meta-Learning for Next POI Recommendation 下一个POI建议的课程元学习
Yudong Chen, Xin Wang, M. Fan, Jizhou Huang, Shengwen Yang, Wenwu Zhu
{"title":"Curriculum Meta-Learning for Next POI Recommendation","authors":"Yudong Chen, Xin Wang, M. Fan, Jizhou Huang, Shengwen Yang, Wenwu Zhu","doi":"10.1145/3447548.3467132","DOIUrl":"https://doi.org/10.1145/3447548.3467132","url":null,"abstract":"Next point-of-interest (POI) recommendation is a hot research field where a recent emerging scenario, next POI to search recommendation, has been deployed in many online map services such as Baidu Maps. One of the key issues in this scenario is providing satisfactory recommendation services for cold-start cities with a limited number of user-POI interactions, which requires transferring the knowledge hidden in rich data from many other cities to these cold-start cities. Existing literature either does not consider the city-transfer issue or cannot simultaneously tackle the data sparsity and pattern diversity issues among various users in multiple cities. To address these issues, we explore city-transfer next POI to search recommendation that transfers the knowledge from multiple cities with rich data to cold-start cities with scarce data. We propose a novel Curriculum Hardness Aware Meta-Learning (CHAML) framework, which incorporates hard sample mining and curriculum learning into a meta-learning paradigm. Concretely, the CHAML framework considers both city-level and user-level hardness to enhance the conditional sampling during meta training, and uses an easy-to-hard curriculum for the city-sampling pool to help the meta-learner converge to a better state. Extensive experiments on two real-world map search datasets from Baidu Maps demonstrate the superiority of CHAML framework.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127173937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Addressing Non-Representative Surveys using Multiple Instance Learning 使用多实例学习解决非代表性调查
Yaniv Katz, O. Vainas
{"title":"Addressing Non-Representative Surveys using Multiple Instance Learning","authors":"Yaniv Katz, O. Vainas","doi":"10.1145/3447548.3467109","DOIUrl":"https://doi.org/10.1145/3447548.3467109","url":null,"abstract":"In recent years, non representative survey sampling and non response bias constitute major obstacles in obtaining a reliable population quantity estimate from finite survey samples. As such, researchers have been focusing on identifying methods to resolve these biases. In this paper, we look at this well known problem from a fresh perspective, and formulate it as a learning problem. To meet this challenge, we suggest solving the learning problem using a multiple instance learning (MIL) paradigm. We devise two different MIL based neural network topologies, each based on a different implementation of an attention pooling layer. These models are trained to accurately infer the population quantity of interest even when facing a biased sample. To the best of our knowledge, this is the first time MIL has ever been suggested as a solution to this problem. In contrast to commonly used statistical methods, this approach can be accomplished without having to collect sensitive personal data of the respondents and without having to access population level statistics of the same sensitive data. To validate the effectiveness of our approaches, we test them on a real-world movie rating dataset which is used to mimic a biased survey by experimentally contaminating it with different kinds of survey bias. We show that our suggested topologies outperform other MIL architectures, and are able to partly counter the adverse effect of biased sampling on the estimation quality. We also demonstrate how these methods can be easily adapted to perform well even when part of the survey is based on a small number of respondents.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127212991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differentiable Pattern Set Mining 可微模式集挖掘
Jonas Fischer, Jilles Vreeken
{"title":"Differentiable Pattern Set Mining","authors":"Jonas Fischer, Jilles Vreeken","doi":"10.1145/3447548.3467348","DOIUrl":"https://doi.org/10.1145/3447548.3467348","url":null,"abstract":"Pattern set mining has been successful in discovering small sets of highly informative and useful patterns from data. To find good models, existing methods heuristically explore the twice-exponential search space over all possible pattern sets in a combinatorial way, by which they are limited to data over at most hundreds of features, as well as likely to get stuck in local minima. Here, we propose a gradient based optimization approach that allows us to efficiently discover high-quality pattern sets from data of millions of rows and hundreds of thousands of features. In particular, we propose a novel type of neural autoencoder called BinaPs, using binary activations and binarizing weights in each forward pass, which are directly interpretable as conjunctive patterns. For training, optimizing a data-sparsity aware reconstruction loss, continuous versions of the weights are learned in small, noisy steps. This formulation provides a link between the discrete search space and continuous optimization, thus allowing for a gradient based strategy to discover sets of high-quality and noise-robust patterns. Through extensive experiments on both synthetic and real world data, we show that BinaPs discovers high quality and noise robust patterns, and unique among all competitors, easily scales to data of supermarket transactions or biological variant calls.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124934347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
The Third International TrueFact Workshop: Making a Credible Web for Tomorrow 第三届国际真相研讨会:为明天打造一个可信的网络
Subhabrata Mukherjee, Qi Li, Sihong Xie, Philip S. Yu, Jing Gao
{"title":"The Third International TrueFact Workshop: Making a Credible Web for Tomorrow","authors":"Subhabrata Mukherjee, Qi Li, Sihong Xie, Philip S. Yu, Jing Gao","doi":"10.1145/3447548.3469467","DOIUrl":"https://doi.org/10.1145/3447548.3469467","url":null,"abstract":"The Third International TrueFact Workshop: Making a Credible Web for Tomorrow is geared towards bringing academic, industry and government researchers and practitioners together to tackle the challenges in misinformation, data quality, truth finding, fact-checking, credibility analysis and rumor detection -- in heterogeneous and multi-modal sources of information including texts, images, videos, relational data, social networks and knowledge graphs.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125940136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Second International MIS2 Workshop: Misinformation and Misbehavior Mining on the Web 第二届国际MIS2研讨会:网络上的错误信息和不当行为挖掘
A. Hofleitner, Meng Jiang, Srijan Kumar, Neil Shah, Kai Shu
{"title":"The Second International MIS2 Workshop: Misinformation and Misbehavior Mining on the Web","authors":"A. Hofleitner, Meng Jiang, Srijan Kumar, Neil Shah, Kai Shu","doi":"10.1145/3447548.3469443","DOIUrl":"https://doi.org/10.1145/3447548.3469443","url":null,"abstract":"Misinformation and misbehavior mining on the web (MIS2) workshop is held virtually on August 14, 2021 and is co-located with the ACM SIGKDD 2021 conference. The web has become a breeding ground for misbehavior and misinformation. It is timely and crucial to understand, detect, forecast, and mitigate their harm. MIS2 workshop as an interdisciplinary venue for researchers and practitioners who study the dark side of the web. The workshop program includes a peer-reviewed set of paper presentations and keynote talks, giving the attendees an immersive experience of this research field.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"538 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123245811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KDD 2021 Tutorial on Systemic Challenges and Solutions on Bias and Unfairness in Peer Review 关于同行评议中偏见和不公平的系统性挑战和解决方案的KDD 2021教程
Nihar B. Shah
{"title":"KDD 2021 Tutorial on Systemic Challenges and Solutions on Bias and Unfairness in Peer Review","authors":"Nihar B. Shah","doi":"10.1145/3447548.3470826","DOIUrl":"https://doi.org/10.1145/3447548.3470826","url":null,"abstract":"Introduction. Peer review is a cornerstone of academic practice [1]. The peer review process is highly regarded by the vast majority of researchers and considered by most to be essential to the communication of scholarly research [2–4]. However, there is also an overwhelming desire for improvement [2, 4, 5]. Problems in peer review have consequences much beyond the outcome for a specific paper or grant, particularly due to the widespread prevalence of the Matthew effect (“rich get richer”) in academia [6]. As noted by [7] “an incompetent review may lead to the rejection of the submitted paper, or of the grant application, and the ultimate failure of the career of the author.” (See also [8, 9].) The importance of peer review and the urgent need for improvements, behooves research on principled approaches towards addressing problems in peer review, particularly at scale. In this tutorial, we discuss a number of key challenges in peer review, outline several directions of research on this topic, and also highlight important open problems that we envisage to be exciting to the community. This document summarizes the contents of the tutorial and provides relevant references.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123701843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
H2MN H2MN
Zhen Zhang, Jiajun Bu, M. Ester, Z. Li, Chengwei Yao, Zhi Yu, Can Wang
{"title":"H2MN","authors":"Zhen Zhang, Jiajun Bu, M. Ester, Z. Li, Chengwei Yao, Zhi Yu, Can Wang","doi":"10.1145/3447548.3467328","DOIUrl":"https://doi.org/10.1145/3447548.3467328","url":null,"abstract":"Graph similarity learning, which measures the similarities between a pair of graph-structured objects, lies at the core of various machine learning tasks such as graph classification, similarity search, etc. In this paper, we devise a novel graph neural network based framework to address this challenging problem, motivated by its great success in graph representation learning. As the vast majority of existing graph neural network models mainly concentrate on learning effective node or graph level representations of a single graph, little effort has been made to jointly reason over a pair of graph-structured inputs for graph similarity learning. To this end, we propose Hierarchical Hypergraph Matching Networks (H2sup>MN) to calculate the similarities between graph pairs with arbitrary structure. Specifically, our proposed H2MN learns graph representation from the perspective of hypergraph, and takes each hyperedge as a subgraph to perform subgraph matching, which could capture the rich substructure similarities across the graph. To enable hierarchical graph representation and fast similarity computation, we further propose a hyperedge pooling operator to transform each graph into a coarse graph of reduced size. Then, a multi-perspective cross-graph matching layer is employed on the coarsened graph pairs to extract the inter-graph similarity. Comprehensive experiments on five public datasets empirically demonstrate that our proposed model can outperform state-of-the-art baselines with different gains for graph-graph classification and regression tasks.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125384072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Domain-oriented Language Modeling with Adaptive Hybrid Masking and Optimal Transport Alignment 基于自适应混合掩蔽和最优传输对齐的面向领域语言建模
Denghui Zhang, Zixuan Yuan, Yanchi Liu, Hao Liu, Fuzhen Zhuang, Hui Xiong, Haifeng Chen
{"title":"Domain-oriented Language Modeling with Adaptive Hybrid Masking and Optimal Transport Alignment","authors":"Denghui Zhang, Zixuan Yuan, Yanchi Liu, Hao Liu, Fuzhen Zhuang, Hui Xiong, Haifeng Chen","doi":"10.1145/3447548.3467215","DOIUrl":"https://doi.org/10.1145/3447548.3467215","url":null,"abstract":"Motivated by the success of pre-trained language models such as BERT in a broad range of natural language processing (NLP) tasks, recent research efforts have been made for adapting these models for different application domains. Along this line, existing domain-oriented models have primarily followed the vanilla BERT architecture and have a straightforward use of the domain corpus. However, domain-oriented tasks usually require accurate understanding of domain phrases, and such fine-grained phrase-level knowledge is hard to be captured by existing pre-training scheme. Also, the word co-occurrences guided semantic learning of pre-training models can be largely augmented by entity-level association knowledge. But meanwhile, there is a risk of introducing noise due to the lack of groundtruth word-level alignment. To address the issues, we provide a generalized domain-oriented approach, which leverages auxiliary domain knowledge to improve the existing pre-training framework from two aspects. First, to preserve phrase knowledge effectively, we build a domain phrase pool as auxiliary knowledge, meanwhile we introduce Adaptive Hybrid Masked Model to incorporate such knowledge. It integrates two learning modes, word learning and phrase learning, and allows them to switch between each other. Second, we introduce Cross Entity Alignment to leverage entity association as weak supervision to augment the semantic learning of pre-trained models. To alleviate the potential noise in this process, we introduce an interpretableOptimal Transport based approach to guide alignment learning. Experiments on four domain-oriented tasks demonstrate the superiority of our framework.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114919299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Efficient Incremental Computation of Aggregations over Sliding Windows 滑动窗口上聚合的高效增量计算
Chao Zhang, Reza Akbarinia, F. Toumani
{"title":"Efficient Incremental Computation of Aggregations over Sliding Windows","authors":"Chao Zhang, Reza Akbarinia, F. Toumani","doi":"10.1145/3447548.3467360","DOIUrl":"https://doi.org/10.1145/3447548.3467360","url":null,"abstract":"Computing aggregation over sliding windows, i.e., finite subsets of an unbounded stream, is a core operation in streaming analytics. We propose PBA (Parallel Boundary Aggregator), a novel parallel algorithm that groups continuous slices of streaming values into chunks and exploits two buffers, cumulative slice aggregations and left cumulative slice aggregations, to compute sliding window aggregations efficiently. PBA runs in O(1) time, performing at most 3 merging operations per slide while consuming O(n) space for windows with n partial aggregations. Our empirical experiments demonstrate that PBA can improve throughput up to 4X while reducing latency, compared to state-of-the-art algorithms.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116030744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信