Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining最新文献_第4页

Friendly Conditional Text Generator 友好条件文本生成器

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining Pub Date : 2023-02-27 DOI: 10.1145/3539597.3570364

N. Kawamae

{"title":"Friendly Conditional Text Generator","authors":"N. Kawamae","doi":"10.1145/3539597.3570364","DOIUrl":"https://doi.org/10.1145/3539597.3570364","url":null,"abstract":"Our goal is to control text generation with more fine-grained conditions at lower computational cost than is possible with current alternatives; these conditions are attributes (i.e., multiple codes and free-text). As large-scale pre-trained language models (PLMs) offer excellent performance in free-form text generation, we explore efficient architectures and training schemes that can best leverage PLMs. Our framework, Friendly Conditional Text Generator (FCTG), introduces a multi-view attention (MVA) mechanism and two training tasks, Masked Attribute Modeling (MAM) and Attribute Linguistic Matching (ALM), to direct various PLMs via modalities between the text and its attributes. The motivation of FCTG is to map texts and attributes into a shared space, and bridge their modality gaps, as the texts and attributes reside in different regions of semantic space. To avoid catastrophic forgetting, modality-free embedded representations are learnt, and used to direct PLMs in this space, FCTG applies MAM to learn attribute representations, maps them in the same space as text through MVA, and optimizes their alignment in this space via ALM. Experiments on publicly available datasets show that FCTG outperforms baselines over higher level conditions at lower computation cost.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121002262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Separating Examination and Trust Bias from Click Predictions for Unbiased Relevance Ranking 从无偏相关性排序的点击预测中分离检查和信任偏差

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining Pub Date : 2023-02-27 DOI: 10.1145/3539597.3570393

Haiyuan Zhao, Jun Xu, Xiao Zhang, Guohao Cai, Zhenhua Dong, Jirong Wen

{"title":"Separating Examination and Trust Bias from Click Predictions for Unbiased Relevance Ranking","authors":"Haiyuan Zhao, Jun Xu, Xiao Zhang, Guohao Cai, Zhenhua Dong, Jirong Wen","doi":"10.1145/3539597.3570393","DOIUrl":"https://doi.org/10.1145/3539597.3570393","url":null,"abstract":"Alleviating the examination and trust bias in ranking systems is an important research line in unbiased learning-to-rank (ULTR). Current methods typically use the propensity to correct the biased user clicks and then learn ranking models based on the corrected clicks. Though successes have been achieved, directly modifying the clicks suffers from the inherent high variance because the propensities are usually involved in the denominators of corrected clicks. The problem gets even worse in the situation of mixed examination and trust bias. To address the issue, this paper proposes a novel ULTR method called Decomposed Ranking Debiasing (DRD). DRD is tailored for learning unbiased relevance models with low variance in the existence of examination and trust bias. Unlike existing methods that directly modify the original user clicks, DRD proposes to decompose each click prediction as the combination of a relevance term outputted by the ranking model and other bias terms. The unbiased relevance model, therefore, can be learned by fitting the overall click predictions to the biased user clicks. A joint learning algorithm is developed to learn the relevance and bias models' parameters alternatively. Theoretical analysis showed that, compared with existing methods, DRD has lower variance while retains unbiasedness. Empirical studies indicated that DRD can effectively reduce the variance and outperform the state-of-the-art ULTR baselines.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131198645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Integrity 2023: Integrity in Social Networks and Media 诚信2023:社交网络和媒体中的诚信

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining Pub Date : 2023-02-27 DOI: 10.1145/3539597.3572704

Lluís Garcia-Pueyo, Panayiotis Tsaparas, Prathyusha Senthil Kumar, T. Sellis, Paolo Papotti, Sibel Adali, G. Manco, Tudor Trufinescu, G. Ranade, J. Verbus, Mehmet Tek, Anthony McCosker

引用次数: 1

Beyond Hard Negatives in Product Search: Semantic Matching Using One-Class Classification (SMOCC) 在产品搜索中超越硬否定:使用单类分类(SMOCC)的语义匹配

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining Pub Date : 2023-02-27 DOI: 10.1145/3539597.3570488

Arindam Bhattacharya, Ankit Gandhi, Vijay Huddar, Ankith M S, Aayush Moroney, Atul Saroop, Rahul Bhagat

{"title":"Beyond Hard Negatives in Product Search: Semantic Matching Using One-Class Classification (SMOCC)","authors":"Arindam Bhattacharya, Ankit Gandhi, Vijay Huddar, Ankith M S, Aayush Moroney, Atul Saroop, Rahul Bhagat","doi":"10.1145/3539597.3570488","DOIUrl":"https://doi.org/10.1145/3539597.3570488","url":null,"abstract":"Semantic matching is an important component of a product search pipeline. Its goal is to capture the semantic intent of the search query as opposed to the syntactic matching performed by a lexical matching system. A semantic matching model captures relationships like synonyms, and also captures common behavioral patterns to retrieve relevant results by generalizing from purchase data. They however suffer from lack of availability of informative negative examples for model training. Various methods have been proposed in the past to address this issue based upon hard-negative mining and contrastive learning. In this work, we propose a novel method for semantic matching based on one-class classification called SMOCC. Given a query and a relevant product, SMOCC generates the representation of an informative negative which is then used to train the model. Our method is based on the idea of generating negatives by using adversarial search in the neighborhood of the positive examples. We also propose a novel approach for selecting the radius to generate adversarial negative products around queries based on the model's understanding of the query. Depending on how we select the radius, we propose two variants of our method: SMOCC-QS, that quantizes the queries using their specificity, and SMOCC-EM, that uses expectation-maximization paradigm to iteratively learn the best radius. We show that our method outperforms the state-of-the-art hard negative mining approaches by increasing the purchase recall by 3 percentage points, and improving the percentage of exacts retrieved by up to 5 percentage points while reducing irrelevant results by 1.8 percentage points.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116161791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Learning and Maximizing Influence in Social Networks Under Capacity Constraints 能力约束下的社会网络学习与影响最大化

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining Pub Date : 2023-02-27 DOI: 10.1145/3539597.3570433

Pritish Chakraborty, Sayan Ranu, Krishna Sri Ipsit Mantri, A. De

{"title":"Learning and Maximizing Influence in Social Networks Under Capacity Constraints","authors":"Pritish Chakraborty, Sayan Ranu, Krishna Sri Ipsit Mantri, A. De","doi":"10.1145/3539597.3570433","DOIUrl":"https://doi.org/10.1145/3539597.3570433","url":null,"abstract":"Influence maximization (IM) refers to the problem of finding a subset of nodes in a network through which we could maximize our reach to other nodes in the network. This set is often called the \"seed set\", and its constituent nodes maximize the social diffusion process. IM has previously been studied in various settings, including under a time deadline, subject to constraints such as that of budget or coverage, and even subject to measures other than the centrality of nodes. The solution approach has generally been to prove that the objective function is submodular, or has a submodular proxy, and thus has a close greedy approximation. In this paper, we explore a variant of the IM problem where we wish to reach out to and maximize the probability of infection of a small subset of bounded capacity K. We show that this problem does not exhibit the same submodular guarantees as the original IM problem, for which we resort to the theory of gamma-weakly submodular functions. Subsequently, we develop a greedy algorithm that maximizes our objective despite the lack of submodularity. We also develop a suitable learning model that out-competes baselines on the task of predicting the top-K infected nodes, given a seed set as input.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123761875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

DisKeyword: Tweet Corpora Exploration for Keyword Selection DisKeyword: Tweet语料库中关键词选择的探索

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining Pub Date : 2023-02-27 DOI: 10.1145/3539597.3573033

Sacha Lévy, Reihaneh Rabbany

引用次数: 0

Learning to Distill Graph Neural Networks 学习提取图神经网络

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining Pub Date : 2023-02-27 DOI: 10.1145/3539597.3570480

Cheng Yang, Yuxin Guo, Y. Xu, Chuan Shi, Jiawei Liu, Chuncheng Wang, Xin Li, Ning Guo, Hongzhi Yin

{"title":"Learning to Distill Graph Neural Networks","authors":"Cheng Yang, Yuxin Guo, Y. Xu, Chuan Shi, Jiawei Liu, Chuncheng Wang, Xin Li, Ning Guo, Hongzhi Yin","doi":"10.1145/3539597.3570480","DOIUrl":"https://doi.org/10.1145/3539597.3570480","url":null,"abstract":"Graph Neural Networks (GNNs) can effectively capture both the topology and attribute information of a graph, and have been extensively studied in many domains. Recently, there is an emerging trend that equips GNNs with knowledge distillation for better efficiency or effectiveness. However, to the best of our knowledge, existing knowledge distillation methods applied on GNNs all employed predefined distillation processes, which are controlled by several hyper-parameters without any supervision from the performance of distilled models. Such isolation between distillation and evaluation would lead to suboptimal results. In this work, we aim to propose a general knowledge distillation framework that can be applied on any pretrained GNN models to further improve their performance. To address the isolation problem, we propose to parameterize and learn distillation processes suitable for distilling GNNs. Specifically, instead of introducing a unified temperature hyper-parameter as most previous work did, we will learn node-specific distillation temperatures towards better performance of distilled models. We first parameterize each node's temperature by a function of its neighborhood's encodings and predictions, and then design a novel iterative learning process for model distilling and temperature learning. We also introduce a scalable variant of our method to accelerate model training. Experimental results on five benchmark datasets show that our proposed framework can be applied on five popular GNN models and consistently improve their prediction accuracies with 3.12% relative enhancement on average. Besides, the scalable variant enables 8 times faster training speed at the cost of 1% prediction accuracy.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123074900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

SoCraft: Advertiser-level Predictive Scoring for Creative Performance on Meta SoCraft:广告商层面的Meta创意表现预测评分

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining Pub Date : 2023-02-27 DOI: 10.1145/3539597.3573032

Alfred Huang, Qi Yang, Sergey I. Nikolenko, Marlo Ongpin, Ilia Gossoudarev, Ngoc Yen Duong, Kirill Lepikhin, Sergey Vishnyakov, Chu-Farseeva Yuyi, Aleksandr Farseev

引用次数: 1

S2TUL: A Semi-Supervised Framework for Trajectory-User Linking 轨迹-用户链接的半监督框架

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining Pub Date : 2023-02-27 DOI: 10.1145/3539597.3570410

Liwei Deng, Hao-Lun Sun, Yan Zhao, Shuncheng Liu, Kai Zheng

{"title":"S2TUL: A Semi-Supervised Framework for Trajectory-User Linking","authors":"Liwei Deng, Hao-Lun Sun, Yan Zhao, Shuncheng Liu, Kai Zheng","doi":"10.1145/3539597.3570410","DOIUrl":"https://doi.org/10.1145/3539597.3570410","url":null,"abstract":"Trajectory-User Linking (TUL) aiming to identify users of anonymous trajectories, has recently received increasing attention due to its wide range of applications, such as criminal investigation and personalized recommendation systems. In this paper, we propose a flexible Semi-Supervised framework for Trajectory-User Linking, namely S2TUL, which includes five components: trajectory-level graph construction, trajectory relation modeling, location-level sequential modeling, a classification layer and greedy trajectory-user relinking. The first two components are proposed to model the relationships among trajectories, in which three homogeneous graphs and two heterogeneous graphs are firstly constructed and then delivered into the graph convolutional networks for converting the discrete identities to hidden representations. Since the graph constructions are irrelevant to the corresponding users, the unlabelled trajectories can also be included in the graphs, which enables the framework to be trained in a semi-supervised way. Afterwards, the location-level sequential modeling component is designed to capture fine-grained intra-trajectory information by passing the trajectories into the sequential neural networks. Finally, these two level representations are concatenated into a classification layer to predict the user of the input trajectory. In the testing phase, a greedy trajectory-user relinking method is proposed to assure the linking results satisfy the timespan overlap constraint. We conduct extensive experiments on three public datasets with six representative competitors. The evaluation results demonstrate the effectiveness of the proposed framework.","PeriodicalId":227804,"journal":{"name":"Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132504671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

From Classic GNNs to Hyper-GNNs for Detecting Camouflaged Malicious Actors 从经典gnn到超级gnn检测伪装的恶意行为者

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining Pub Date : 2023-02-27 DOI: 10.1145/3539597.3572989

Venus Haghighi

引用次数: 0