Proceedings of the 30th ACM International Conference on Information & Knowledge Management最新文献_第7页

SCOPA 扫帚

Proceedings of the 30th ACM International Conference on Information & Knowledge Management Pub Date : 2021-10-26 DOI: 10.1145/3459637.3482176

Dohyeon Lee, Jaeseong Lee, Gyewon Lee, Byung-Gon Chun, Seung-won Hwang

{"title":"SCOPA","authors":"Dohyeon Lee, Jaeseong Lee, Gyewon Lee, Byung-Gon Chun, Seung-won Hwang","doi":"10.1145/3459637.3482176","DOIUrl":"https://doi.org/10.1145/3459637.3482176","url":null,"abstract":"The recent advent of cross-lingual embeddings, such as multilingual BERT (mBERT), provides a strong baseline for zero-shot cross-lingual transfer. There also exists increasing research attention to reduce the alignment discrepancy of cross-lingual embeddings between source and target languages, via generating code-switched sentences by substituting randomly selected words in the source languages with their counterparts of the target languages. Although these approaches improve the performance, naively code-switched sentences can have inherent limitations. In this paper, we propose SCOPA, a novel technique to improve the performance of zero-shot cross-lingual transfer. Instead of using the embeddings of code-switched sentences directly, SCOPA mixes them softly with the embeddings of original sentences. In addition, SCOPA utilizes an additional pairwise alignment objective, which aligns the vector differences of word pairs instead of word-level embeddings, in order to transfer contextualized information between different languages while preserving language-specific information. Experiments on the PAWS-X and MLDoc dataset show the effectiveness of SCOPA.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126156971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

CIKM 2021 Tutorial on Fairness of Machine Learning in Recommender Systems 推荐系统中机器学习公平性的CIKM 2021教程

Proceedings of the 30th ACM International Conference on Information & Knowledge Management Pub Date : 2021-10-26 DOI: 10.1145/3459637.3483280

Yunqi Li, Yingqiang Ge, Yongfeng Zhang

{"title":"CIKM 2021 Tutorial on Fairness of Machine Learning in Recommender Systems","authors":"Yunqi Li, Yingqiang Ge, Yongfeng Zhang","doi":"10.1145/3459637.3483280","DOIUrl":"https://doi.org/10.1145/3459637.3483280","url":null,"abstract":"Recently, there has been growing attention on fairness considerations in machine learning. As one of the most pervasive applications of machine learning, recommender systems are gaining increasing and critical impacts on human and society since a growing number of users use them for information seeking and decision making. Therefore, it is crucial to address the potential unfairness problems in recommendation, which may hurt users' or providers' satisfaction in recommender systems as well as the interests of the platforms. The tutorial focuses on the foundations and algorithms for fairness in recommendation. It also presents a brief introduction about fairness in basic machine learning tasks such as classification and ranking. The tutorial will introduce the taxonomies of current fairness definitions and evaluation metrics for fairness concerns. We will introduce previous works about fairness in recommendation and also put forward future fairness research directions. The tutorial aims at introducing and communicating fairness in recommendation methods to the community, as well as gathering researchers and practitioners interested in this research direction for discussions, idea communications, and research promotions.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"206 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124637008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Query Reformulation for Descriptive Queries of Jargon Words Using a Knowledge Graph based on a Dictionary 基于词典的知识图谱的术语描述性查询重构

Proceedings of the 30th ACM International Conference on Information & Knowledge Management Pub Date : 2021-10-26 DOI: 10.1145/3459637.3482382

Bosung Kim, H. Choi, Haeun Yu, Youngjoong Ko

{"title":"Query Reformulation for Descriptive Queries of Jargon Words Using a Knowledge Graph based on a Dictionary","authors":"Bosung Kim, H. Choi, Haeun Yu, Youngjoong Ko","doi":"10.1145/3459637.3482382","DOIUrl":"https://doi.org/10.1145/3459637.3482382","url":null,"abstract":"Query reformulation (QR) is a key factor in overcoming the problems faced by the lexical chasm in information retrieval (IR) systems. In particular, when searching for jargon, people tend to use descriptive queries, such as \"a medical examination of the colon\" rather than \"colonoscopy,\" or they often use them interchangeably. Thus, transforming users' descriptive queries into appropriate jargon queries helps to retrieve more relevant documents. In this paper, we propose a new graph-based QR system that uses a dictionary, where the model does not require human-labeled data. Given a descriptive query, our system predicts the corresponding jargon word over a graph consisting of pairs of a headword and its description in the dictionary. First, we train a graph neural network to represent the relational properties between words and to infer a jargon word using compositional information of the descriptive query's words. Moreover, we propose a graph search model that finds the target node in real time using the relevance scores of neighborhood nodes. By adding this fast graph search model to the front of the proposed system, we reduce the reformulating time significantly. Experimental results on two datasets show that the proposed method can effectively reformulate descriptive queries to corresponding jargon words as well as improve retrieval performance under several search frameworks.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127456099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Answering POI-recommendation Questions using Tourism Reviews 用旅游评论回答poi推荐问题

Proceedings of the 30th ACM International Conference on Information & Knowledge Management Pub Date : 2021-10-26 DOI: 10.1145/3459637.3482320

Danish Contractor, Krunal Shah, Aditi Partap, Parag Singla, Mausam Mausam

{"title":"Answering POI-recommendation Questions using Tourism Reviews","authors":"Danish Contractor, Krunal Shah, Aditi Partap, Parag Singla, Mausam Mausam","doi":"10.1145/3459637.3482320","DOIUrl":"https://doi.org/10.1145/3459637.3482320","url":null,"abstract":"We introduce the novel and challenging task of answering Points-of-interest (POI) recommendation questions, using a collection of reviews that describe candidate answer entities (POIs). We harvest a QA dataset that contains 47,124 paragraph-sized user questions from travelers seeking POI recommendations for hotels, attractions and restaurants. Each question can have thousands of candidate entities to choose from and each candidate is associated with a collection of unstructured reviews. Questions can include requirements based on physical location, budget, timings as well as other subjective considerations related to ambience, quality of service etc. Our dataset requires reasoning over a large number of candidate answer entities (over 5300 per question on average) and we find that running commonly used neural architectures for QA is prohibitively expensive. Further, commonly used retriever-ranker based methods also do not work well for our task due to the nature of review-documents. Thus, as a first attempt at addressing some of the novel challenges of reasoning-at-scale posed by our task, we present a task specific baseline model that uses a three-stage cluster-select-rerank architecture. The model first clusters text for each entity to identify exemplar sentences describing an entity. It then uses a neural information retrieval (IR) module to select a set of potential entities from the large candidate set. A reranker uses a deeper attention-based architecture to pick the best answers from the selected entities. This strategy performs better than a pure retrieval or a pure attention-based reasoning approach yielding nearly 25% relative improvement in Hits@3 over both approaches. To the best of our knowledge we are the first to present an unstructured QA-style task for POI-recommendation, using real-world tourism questions and POI-reviews.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127471411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Locker: Locally Constrained Self-Attentive Sequential Recommendation Locker:局部约束自关注顺序推荐

Proceedings of the 30th ACM International Conference on Information & Knowledge Management Pub Date : 2021-10-26 DOI: 10.1145/3459637.3482136

Zhankui He, Handong Zhao, Zhe Lin, Zhaowen Wang, Ajinkya Kale, Julian McAuley

{"title":"Locker: Locally Constrained Self-Attentive Sequential Recommendation","authors":"Zhankui He, Handong Zhao, Zhe Lin, Zhaowen Wang, Ajinkya Kale, Julian McAuley","doi":"10.1145/3459637.3482136","DOIUrl":"https://doi.org/10.1145/3459637.3482136","url":null,"abstract":"Recently, self-attentive models have shown promise in sequential recommendation, given their potential to capture user long-term preferences and short-term dynamics simultaneously. Despite their success, we argue that self-attention modules, as a non-local operator, often fail to capture short-term user dynamics accurately due to a lack of inductive local bias. To examine our hypothesis, we conduct an analytical experiment on controlled 'short-term' scenarios. We observe a significant performance gap between self-attentive recommenders with and without local constraints, which implies that short-term user dynamics are not sufficiently learned by existing self-attentive recommenders. Motivated by this observation, we propose a simple framework, (Locker) for self-attentive recommenders in a plug-and-play fashion. By combining the proposed local encoders with existing global attention heads, Locker enhances short-term user dynamics modeling, while retaining the long-term semantics captured by standard self-attentive encoders. We investigate Locker with five different local methods, outperforming state-of-the-art self-attentive recom- menders on three datasets by 17.19% (NDCG@20) on average.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130176136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Fraud Detection under Multi-Sourced Extremely Noisy Annotations 多源极噪声注释下的欺诈检测

Proceedings of the 30th ACM International Conference on Information & Knowledge Management Pub Date : 2021-10-26 DOI: 10.1145/3459637.3482433

Chuang Zhang, Qizhou Wang, Tengfei Liu, Xun Lu, Jin Hong, Bo Han, Chen Gong

{"title":"Fraud Detection under Multi-Sourced Extremely Noisy Annotations","authors":"Chuang Zhang, Qizhou Wang, Tengfei Liu, Xun Lu, Jin Hong, Bo Han, Chen Gong","doi":"10.1145/3459637.3482433","DOIUrl":"https://doi.org/10.1145/3459637.3482433","url":null,"abstract":"Fraud detection in e-commerce, which is critical to protecting the capital safety of users and financial corporations, aims at determining whether an online transaction or other activity is fraudulent or not. This problem has been previously addressed by various fully supervised learning methods. However, the true labels for training a supervised fraud detection model are difficult to collect in many real-world cases. To circumvent this issue, a series of automatic annotation techniques are employed instead in generating multiple noisy annotations for each unknown activity. In order to utilize these low-quality, multi-sourced annotations in achieving reliable detection results, we propose an iterative two-staged fraud detection framework with multi-sourced extremely noisy annotations. In label aggregation stage, multi-sourced labels are integrated by voting with adaptive weights; and in label correction stage, the correctness of the aggregated labels are properly estimated with the help of a handful of exactly labeled data and the results are used to train a robust fraud detector. These two stages benefit from each other, and the iterative executions lead to steadily improved detection results. Therefore, our method is termed \"Label Aggregation and Correction\" (LAC). Experimentally, we collect millions of transaction records from Alipay in two different fraud detection scenarios, i.e., credit card theft and promotion abuse fraud. When compared with state-of-the-art counterparts, our method can achieve at least 0.019 and 0.117 improvements in terms of average AUC on the two collected datasets, which clearly demonstrate the effectiveness.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131035907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

CloudRCA: A Root Cause Analysis Framework for Cloud Computing Platforms CloudRCA:云计算平台的根本原因分析框架

Proceedings of the 30th ACM International Conference on Information & Knowledge Management Pub Date : 2021-10-26 DOI: 10.1145/3459637.3481903

Yingying Zhang, Zhengxiong Guan, Huajie Qian, Leili Xu, Hengbo Liu, Qingsong Wen, Liang Sun, Junwei Jiang, L. Fan, Minhui Ke

{"title":"CloudRCA: A Root Cause Analysis Framework for Cloud Computing Platforms","authors":"Yingying Zhang, Zhengxiong Guan, Huajie Qian, Leili Xu, Hengbo Liu, Qingsong Wen, Liang Sun, Junwei Jiang, L. Fan, Minhui Ke","doi":"10.1145/3459637.3481903","DOIUrl":"https://doi.org/10.1145/3459637.3481903","url":null,"abstract":"As business of Alibaba expands across the world among various industries, higher standards are imposed on the service quality and reliability of big data cloud computing platforms which constitute the infrastructure of Alibaba Cloud. However, root cause analysis in these platforms is non-trivial due to the complicated system architecture. In this paper, we propose a root cause analysis framework called CloudRCA which makes use of heterogeneous multi-source data including Key Performance Indicators (KPIs), logs, as well as topology, and extracts important features via state-of-the-art anomaly detection and log analysis techniques. The engineered features are then utilized in a Knowledge-informed Hierarchical Bayesian Network (KHBN) model to infer root causes with high accuracy and efficiency. Ablation study and comprehensive experimental comparisons demonstrate that, compared to existing frameworks, CloudRCA 1) consistently outperforms existing approaches in f1-score across different cloud systems; 2) can handle novel types of root causes thanks to the hierarchical structure of KHBN; 3) performs more robustly with respect to algorithmic configurations; and 4) scales more favorably in the data and feature sizes. Experiments also show that a cross-platform transfer learning mechanism can be adopted to further improve the accuracy by more than 10%. CloudRCA has been integrated into the diagnosis system of Alibaba Cloud and employed in three typical cloud computing platforms including MaxCompute, Realtime Compute and Hologres. It saves Site Reliability Engineers (SREs) more than 20% in the time spent on resolving failures in the past twelve months and improves service reliability significantly.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130741633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

50 Ways to Bake a Cookie: Mapping the Landscape of Procedural Texts 烘焙饼干的50种方法:绘制程序性文本的景观

Proceedings of the 30th ACM International Conference on Information & Knowledge Management Pub Date : 2021-10-26 DOI: 10.1145/3459637.3482405

Moran Mizrahi, Dafna Shahaf

引用次数: 1

Cformer Cformer

Proceedings of the 30th ACM International Conference on Information & Knowledge Management Pub Date : 2021-10-26 DOI: 10.1145/3459637.3482073

Arezoo Hatefi, Xuan-Son Vu, M. Bhuyan, F. Drewes

引用次数: 3

MVQAS

Proceedings of the 30th ACM International Conference on Information & Knowledge Management Pub Date : 2021-10-26 DOI: 10.1145/3459637.3481971

Haoyue Bai, Xiaoyan Shan, Yefan Huang, Xiaoli Wang

{"title":"MVQAS","authors":"Haoyue Bai, Xiaoyan Shan, Yefan Huang, Xiaoli Wang","doi":"10.1145/3459637.3481971","DOIUrl":"https://doi.org/10.1145/3459637.3481971","url":null,"abstract":"This paper demonstrates a medical visual question answering (VQA) system to address three challenges: 1) medical VQA often lacks large-scale labeled training data which requires huge efforts to build; 2) it is costly to implement and thoroughly compare medical VQA models on self-created datasets; 3) applying general VQA models to the medical domain by transfer learning is challenging due to various visual concepts between general images and medical images. Our system has three main components: data generation, model library, and model practice. To address the first challenge, we first allow users to upload self-collected clinical data such as electronic medical records (EMRs) to the data generation component and provides an annotating tool for labeling the data. Then, the system semi-automatically generates medical VQAs for users. Second, we develop a model library by implementing VQA models for users to evaluate their datasets. Users can do simple configurations by selecting self-interested models. The system then automatically trains the models, conducts extensive experimental evaluation, and reports comprehensive findings. The reports provide new insights into the strengths and weaknesses of selected models. Third, we provide an online chat module for users to communicate with an AI robots for further evaluating the models. The source codes are shared on https://github.com/shyanneshan/VQA-Demo.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125479148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2