Nasim Sonboli, M. Mansoury, Ziyue Guo, Shreyas Kadekodi, Weiwen Liu, Zijun Liu, Andrew Schwartz, R. Burke
{"title":"librec-auto: A Tool for Recommender Systems Experimentation","authors":"Nasim Sonboli, M. Mansoury, Ziyue Guo, Shreyas Kadekodi, Weiwen Liu, Zijun Liu, Andrew Schwartz, R. Burke","doi":"10.1145/3459637.3482006","DOIUrl":"https://doi.org/10.1145/3459637.3482006","url":null,"abstract":"Recommender systems are complex. They integrate the individual needs of users with the characteristics of particular domains of application which may span items from large and potentially heterogeneous collections. Extensive experimentation is required to understand the multidimensional properties of recommendation algorithms and the fit between algorithm and application. librec-auto is a tool that automates many aspects of off-line batch recommender system experimentation. It has a large library of state-of-the-art and historical recommendation algorithms and a wide variety of evaluation metrics. It further supports the study of diversity and fairness in recommendation through the integration of re-ranking algorithms and fairness-aware metrics. It supports declarative configuration for reproducible experiment management and supports multiple forms of hyper-parameter optimization.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116942449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How to Leverage a Multi-layered Transformer Language Model for Text Clustering: an Ensemble Approach","authors":"Mira Ait-Saada, François Role, M. Nadif","doi":"10.1145/3459637.3482121","DOIUrl":"https://doi.org/10.1145/3459637.3482121","url":null,"abstract":"Pre-trained Transformer-based word embeddings are now widely used in text mining where they are known to significantly improve supervised tasks such as text classification, named entity recognition and question answering. Since the Transformer models create several different embeddings for the same input, one at each layer of their architecture, various studies have already tried to identify those of these embeddings that most contribute to the success of the above-mentioned tasks. In contrast the same performance analysis has not yet been carried out in the unsupervised setting. In this paper we evaluate the effectiveness of Transformer models on the important task of text clustering. In particular, we present a clustering ensemble approach that harnesses all the network's layers. Numerical experiments carried out on real datasets with different Transformer models show the effectiveness of the proposed method compared to several baselines.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117198564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mingwei Zhang, Yang Yang, Rizwan Abbas, Ke Deng, Jianxin Li, Bin Zhang
{"title":"SNPR","authors":"Mingwei Zhang, Yang Yang, Rizwan Abbas, Ke Deng, Jianxin Li, Bin Zhang","doi":"10.1145/3459637.3482394","DOIUrl":"https://doi.org/10.1145/3459637.3482394","url":null,"abstract":"Next Point-of-Interest (POI) recommendation plays an important role in location-based services. The state-of-the-art methods utilize recurrent neural networks (RNNs) to model users' check-in sequences and have shown promising results. However, they tend to recommend POIs similar to those that the user has often visited. As a result, users become bored with obvious recommendations. To address this issue, we propose Serendipity-oriented Next POI Recommendation model (SNPR), a supervised multi-task learning problem, with objective to recommend unexpected and relevant POIs only. To this end, we define the quantitativeserendipity as a trade-off ofrelevance andunexpectedness in the context of next POI recommendation, and design a dedicated neural network with Transformer to capture complex interdependencies between POIs in user's check-in sequence. Extensive experimental results show that our model can improverelevance significantly while theunexpectedness outperforms the state-of-the-art serendipity-oriented recommendation methods.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"55 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120921861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vandalism Detection in OpenStreetMap via User Embeddings","authors":"Yinxiao Li, T. J. Anderson, Yiqi Niu","doi":"10.1145/3459637.3482213","DOIUrl":"https://doi.org/10.1145/3459637.3482213","url":null,"abstract":"OpenStreetMap (OSM) is a free and openly-editable database of geographic information. Over the years, OSM has evolved into the world's largest open knowledge base of geospatial data, and protecting OSM from the risk of vandalized and falsified information has become paramount to ensuring its continued success. However, despite the increasing usage of OSM and a wide interest in vandalism detection on open knowledge bases such as Wikipedia and Wikidata, OSM has not attracted as much attention from the research community, partially due to a lack of publicly available vandalism corpus. In this paper, we report on the construction of the first OSM vandalism corpus, and release it publicly. We describe a user embedding approach to create OSM user embeddings and add embedding features to a machine learning model to improve vandalism detection in OSM. We validate the model against our vandalism corpus, and observe solid improvements in key metrics. The validated model is deployed into production for vandalism detection on Daylight Map.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"117 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120967726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trilateral Spatiotemporal Attention Network for User Behavior Modeling in Location-based Search","authors":"Yi Qi, Ke Hu, Bo Zhang, Jia Cheng, Jun Lei","doi":"10.1145/3459637.3482206","DOIUrl":"https://doi.org/10.1145/3459637.3482206","url":null,"abstract":"In location-based search, user's click behavior is naturally bonded with trilateral spatiotemporal information, i.e., the locations of historical user requests, the locations of corresponding clicked items and the occurring time of historical clicks. Appropriate modeling of the trilateral spatiotemporal user click behavior sequence is key to the success of any location-based search service. Though abundant and helpful, existing user behavior modeling methods are insufficient for modeling the rich patterns in trilateral spatiotemporal sequence in that they ignore the interplay among request's geo- graphic information, item's geographic information and the click time. In this work, we study the user behavior modeling problem in location-based search systematically. We propose TRISAN, short for Trilateral Spatiotemporal Attention Network, a novel attention- based neural model that incorporates temporal relatedness into both the modeling of item's geographic closeness and the modeling of request's geographic closeness through a fusion mechanism. In addition, we propose to model the geographic closeness both by distance and by semantic similarity. Extensive experiments demonstrate that the proposed method outperforms existing methods by a large margin and every part of our modeling strategy contributes to its final success.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124937437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hui Wang, Wanlei Zhao, Xiangxiang Zeng, Jianye Yang
{"title":"Fast k-NN Graph Construction by GPU based NN-Descent","authors":"Hui Wang, Wanlei Zhao, Xiangxiang Zeng, Jianye Yang","doi":"10.1145/3459637.3482344","DOIUrl":"https://doi.org/10.1145/3459637.3482344","url":null,"abstract":"NN-Descent is a classic k-NN graph construction approach. It is still widely employed in machine learning, computer vision, and information retrieval tasks due to its efficiency and genericness. However, the current design only works well on CPU. In this paper, NN-Descent has been redesigned to adapt to the GPU architecture. A new graph update strategy called selective update is proposed. It reduces the data exchange between GPU cores and GPU global memory significantly, which is the processing bottleneck under GPU computation architecture. This redesign leads to full exploitation of the parallelism of the GPU hardware. In the meantime, the genericness, as well as the simplicity of NN-Descent, are well-preserved. Moreover, a procedure that allows to k-NN graph to be merged efficiently on GPU is proposed. It makes the construction of high-quality k-NN graphs for out-of-GPU-memory datasets tractable. Our approach is 100-250× faster than the single-thread NN-Descent and is 2.5-5× faster than the existing GPU-based approaches as we tested on million as well as billion scale datasets.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125164448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ScarceGAN","authors":"S. Chakrabarty, Rukma Talwadker, Tridib Mukherjee","doi":"10.1145/3459637.3482474","DOIUrl":"https://doi.org/10.1145/3459637.3482474","url":null,"abstract":"This paper introduces ScarceGAN which focuses on identification of extremely rare or scarce samples from multi-dimensional longitudinal telemetry data with small and weak label prior. We specifically address: (i) severe scarcity in positive class, stemming from both underlying organic skew in the data, as well as extremely limited labels; (ii) multi-class nature of the negative samples, with uneven density distributions and partially overlapping feature distributions; and (iii) massively unlabelled data leading to tiny and weak prior on both positive and negative classes, and possibility of unseen or unknown behavior in the unlabelled set, especially in the negative class. Although related to PU learning problems, we contend that knowledge (or lack of it) on the negative class can be leveraged to learn the compliment of it (i.e., the positive class) better in a semi-supervised manner. To this effect, ScarceGAN re-formulates semi-supervised GAN by accommodating weakly labelled multi- class negative samples and the available positive samples. It relaxes the supervised discriminator's constraint on exact differentiation be- tween negative samples by introducing a 'leeway' term for samples with noisy prior. We propose modifications to the cost objectives of discriminator, in supervised and unsupervised path as well as that of the generator. For identifying risky players in skill gaming, this formulation in whole gives us a recall of over 85% (~60% jump over vanilla semi-supervised GAN) on our scarce class with very minimal verbosity in the unknown space. Further ScarceGAN out- performs the recall benchmarks established by recent GAN based specialized models for the positive imbalanced class identification and establishes a new benchmark in identifying one of rare attack classes (0.09%) in the intrusion dataset from the KDDCUP99 challenge. We establish ScarceGAN to be one of new competitive benchmark frameworks in the rare class identification for longitudinal telemetry data.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123675514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Knowledge Graph Representation Learning as Groupoid: Unifying TransE, RotatE, QuatE, ComplEx","authors":"Han Yang, Junfei Liu","doi":"10.1145/3459637.3482442","DOIUrl":"https://doi.org/10.1145/3459637.3482442","url":null,"abstract":"Knowledge graph (KG) representation learning which aims to encode entities and relations into low-dimensional spaces, has been widely used in KG completion and link prediction. Although existing KG representation learning models have shown promising performance, the theoretical mechanism behind existing models is much less well-understood. It is challenging to accurately portray the internal connections between models and build a competitive model systematically. To overcome this problem, a unified KG representation learning framework, called GrpKG, is proposed in this paper to model the KG representation learning from a generic groupoid perspective. We discover that many existing models are essentially the same in the sense of groupoid isomorphism and further provide transformation methods between different models. Moreover, we explore the applications of GrpKG in the model classification as well as other processes. The experiments on several benchmark data sets validate the effectiveness and superiority of our framework by comparing two proposed models (GrpQ8 and GrpM2) with the state-of-the-art models.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125400071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-Supervised Learning on Users' Spontaneous Behaviors for Multi-Scenario Ranking in E-commerce","authors":"Yulong Gu, Wentian Bao, Dan Ou, Xiang Li, Baoliang Cui, Biyu Ma, Haikuan Huang, Qingwen Liu, Xiaoyi Zeng","doi":"10.1145/3459637.3481953","DOIUrl":"https://doi.org/10.1145/3459637.3481953","url":null,"abstract":"Multi-scenario Learning to Rank is essential for Recommender Systems, Search Engines and Online Advertising in e-commerce portals where the ranking models are usually applied in many scenarios. However, existing works mainly focus on learning the ranking model for a single scenario, and pay less attention to learning ranking models for multiple scenarios. We identify two practical challenges in industrial multi-scenario ranking systems: (1) The Feedback Loop problem that the model is always trained on the items chosen by the ranker itself. (2) Insufficient training data for small and new scenarios. To address the above issues, we present ZEUS, a novel framework that learns a Zoo of ranking modEls for mUltiple Scenarios based on pre-training on users' spontaneous behaviors (e.g. queries which are directly searched in the search box and not recommended by the ranking system). ZEUS decomposes the training process into two stages: self-supervised learning based pre-training and fine-tuning. Firstly, ZEUS performs self-supervised learning on users' spontaneous behaviors and generates a pre-trained model. Secondly, ZEUS fine-tunes the pre-trained model on users' implicit feedback in multiple scenarios. Extensive experiments on Alibaba's production dataset demonstrate the effectiveness of ZEUS, which significantly outperforms state-of-the-art methods. ZEUS averagely achieves 6.0%, 9.7%, 11.7% improvement in CTR, CVR and GMV respectively than state-of-the-art method.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126807648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Li, Rishi Choudhary, A. Younus, Bruno Ohana, N. Baker, B. Leen, M. A. Qureshi
{"title":"RCES","authors":"Wei Li, Rishi Choudhary, A. Younus, Bruno Ohana, N. Baker, B. Leen, M. A. Qureshi","doi":"10.1145/3459637.3481990","DOIUrl":"https://doi.org/10.1145/3459637.3481990","url":null,"abstract":"To assist the COVID-19 focused researchers in life science and healthcare in understanding the pandemic, we present an exploratory information retrieval system called RCES. The system employs a previously developed EVE (Explainable Vector-based Embedding) model using DBpedia and an adopted model using MeSH taxonomies to exploit concept relations related to COVID-19. Various expansion methods are also developed, along with explanations and facets that collectively form rapid cues for a valuable navigational and informed user experience.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115045766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}