{"title":"Dynamic Early Exit Scheduling for Deep Neural Network Inference through Contextual Bandits","authors":"Weiyu Ju, Wei Bao, Liming Ge, Dong Yuan","doi":"10.1145/3459637.3482335","DOIUrl":"https://doi.org/10.1145/3459637.3482335","url":null,"abstract":"Recent advances in Deep Neural Networks (DNNs) have dramatically improved the accuracy of DNN inference, but also introduce larger latency. In this paper, we investigate how to utilize early exit, a novel method that allows inference to exit at earlier exit points at the cost of an acceptable amount of accuracy. Scheduling the optimal exit point on a per-instance basis is challenging because the realized performance (i.e., confidence and latency) of each exit point is random and the statistics vary in different scenarios. Moreover, the performance has dependencies among the exit points, further complicating the problem. Therefore, the optimal exit scheduling decision cannot be known in advance but should be learned in an online fashion. To this end, we propose Dynamic Early Exit (DEE), a real-time online learning algorithm based on contextual bandit analysis. DEE observes the performance at each exit point as context and decides whether to exit or keep processing. Unlike standard contextual bandit analyses, the rewards of the decisions in our problem are temporally dependent. Furthermore, the performances of the earlier exit points are inevitably explored more compared to the later ones, which poses an unbalance exploration-exploitation trade-off. DEE addresses the aforementioned challenges, where its regret per inference asymptotically approaches zero. We compare DEE with four benchmark schemes in the real-world experiment. The experiment result shows that DEE can improve the overall performance by up to 98.1% compared to the best benchmark scheme.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126709524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tan Yu, Yi Yang, Yi Li, Lin Liu, Mingming Sun, Ping Li
{"title":"Multi-modal Dictionary BERT for Cross-modal Video Search in Baidu Advertising","authors":"Tan Yu, Yi Yang, Yi Li, Lin Liu, Mingming Sun, Ping Li","doi":"10.1145/3459637.3481937","DOIUrl":"https://doi.org/10.1145/3459637.3481937","url":null,"abstract":"Due to their attractiveness, video advertisements are adored by advertisers. Baidu, as one of the leading search advertisement platforms in China, is putting more and more effort into video advertisements for its advertisement customers. Search-based video advertisement display is, in essence, a cross-modal retrieval problem, which is normally tackled through joint embedding methods. Nevertheless, due to the lack of interactions between text features and image features, joint embedding methods cannot achieve as high accuracy as its counterpart based on attention. Inspired by the great success achieved by BERT in NLP tasks, many cross-modal BERT models emerge and achieve excellent performance in cross-modal retrieval. Last year, Baidu also launched a cross-modal BERT, CAN, in video advertisement platform, and achieved considerably better performance than the previous joint-embedding model. In this paper, we present our recent work for video advertisement retrieval, Multi-modal Dictionary BERT (MDBERT) model. Compared with CAN and other cross-modal BERT models, MDBERT integrates a joint dictionary, which is shared among video features and word features. It maps the relevant word features and video features into the same codeword and thus fosters effective cross-modal attention. To support end-to-end training, we propose to soften the codeword assignment. Meanwhile, to enhance the inference efficiency, we adopt the product quantization to achieve fine-level feature space partition at a low cost. After launching MDBERT in Baidu video advertising platform, the conversion ratio (CVR) increases by 3.34%, bringing a considerable revenue boost for advertisers in Baidu.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126842149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bo Chen, Yichao Wang, Zhirong Liu, Ruiming Tang, Wei Guo, Hongkun Zheng, Weiwei Yao, Muyu Zhang, Xiuqiang He
{"title":"Enhancing Explicit and Implicit Feature Interactions via Information Sharing for Parallel Deep CTR Models","authors":"Bo Chen, Yichao Wang, Zhirong Liu, Ruiming Tang, Wei Guo, Hongkun Zheng, Weiwei Yao, Muyu Zhang, Xiuqiang He","doi":"10.1145/3459637.3481915","DOIUrl":"https://doi.org/10.1145/3459637.3481915","url":null,"abstract":"Effectively modeling feature interactions is crucial for CTR prediction in industrial recommender systems. The state-of-the-art deep CTR models with parallel structure (e.g., DCN) learn explicit and implicit feature interactions through independent parallel networks. However, these models suffer from trivial sharing issues, namely insufficient sharing in hidden layers and excessive sharing in network input, limiting the model's expressiveness and effectiveness. Therefore, to enhance information sharing between explicit and implicit feature interactions, we propose a novel deep CTR model EDCN. EDCN introduces two advanced modules, namely bridge module and regulation module, which work collaboratively to capture the layer-wise interactive signals and learn discriminative feature distributions for each hidden layer of the parallel networks. Furthermore, two modules are lightweight and model-agnostic, which can be generalized well to mainstream parallel deep CTR models. Extensive experiments and studies are conducted to demonstrate the effectiveness of EDCN on two public datasets and one industrial dataset. Moreover, the compatibility of two modules over various parallel-structured models is verified, and they have been deployed onto the online advertising platform in Huawei, where a one-month A/B test demonstrates the improvement over the base parallel-structured model by 7.30% and 4.85% in terms of CTR and eCPM, respectively.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126249949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huanyu Liu, Ruifang He, Liangliang Zhao, Haocheng Wang, Ruifang Wang
{"title":"SCMGR","authors":"Huanyu Liu, Ruifang He, Liangliang Zhao, Haocheng Wang, Ruifang Wang","doi":"10.1145/3459637.3482476","DOIUrl":"https://doi.org/10.1145/3459637.3482476","url":null,"abstract":"Social summarization aims to produce a concise summary that describes the core content of a collection of posts on a specific topic. Existing methods tend to produce sparse or ambiguous representations of posts due to only using short and informal text content. Latest researches use social relations to improve diversity of summaries, yet they model social relations as a regularization item, which has poor flexibility and generalization. Those methods could not embody the deep semantic and social interactions among posts, making summaries still suffer from redundancy. We propose to use Social Context and Multi-Granularity Relations (SCMGR) to improve unsupervised social summarization. It learns more informative representations of posts considering both text semantics and social structure information without any annotated data. First, we design two sociologically motivated meta-paths to construct a social context graph among posts, and adopt a graph convolutional network to aggregate social context information from neighbors. Second, we design a multi-granularity relation decoder to capture the deeper semantic and social interactions from post-word and post-post aspects respectively, which can provide guidance for summary selection from semantic and social structure perspectives. Finally, a sparse reconstruction-based extractor is used to select posts that can best reconstruct original content and social network structure as summaries. Our approach improves the coverage and diversity of summaries. Experimental results on both English and Chinese corpora prove the effectiveness of our model.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122285446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding the Property of Long Term Memory for the LSTM with Attention Mechanism","authors":"Wendong Zheng, Putian Zhao, Kai Huang, Gang Chen","doi":"10.1145/3459637.3482399","DOIUrl":"https://doi.org/10.1145/3459637.3482399","url":null,"abstract":"Recent trends of incorporating LSTM network with different attention mechanisms in time series forecasting have led researchers to consider the attention module as an essential component. While existing studies revealed the effectiveness of attention mechanism with some visualization experiments, the underlying rationale behind their outstanding performance on learning long-term dependencies remains hitherto obscure. In this paper, we aim to elaborate on this fundamental question by conducting a thorough investigation of the memory property for LSTM network with attention mechanism. We present a theoretical analysis of LSTM integrated with attention mechanism, and demonstrate that it is capable of generating an adaptive decay rate which dynamically controls the memory decay according to the obtained attention score. In particular, our theory shows that attention mechanism brings significantly slower decays than the exponential decay rate of a standard LSTM. Experimental results on four real-world time series datasets demonstrate the superiority of the attention mechanism for maintaining long-term memory when compared to the state-of-the-art methods, and further corroborate our theoretical analysis.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"30 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120894288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shun Zheng, Zhifeng Gao, Wei Cao, Jiang Bian, Tie-Yan Liu
{"title":"HierST: A Unified Hierarchical Spatial-temporal Framework for COVID-19 Trend Forecasting","authors":"Shun Zheng, Zhifeng Gao, Wei Cao, Jiang Bian, Tie-Yan Liu","doi":"10.1145/3459637.3481927","DOIUrl":"https://doi.org/10.1145/3459637.3481927","url":null,"abstract":"The outbreak of the COVID-19 pandemic has largely influenced the world and our normal daily lives. To combat this pandemic efficiently, governments usually need to coordinate essential resources across multiple regions and adjust intervention polices at the right time, which all call for accurate and robust forecasting of future epidemic trends. However, designing such a forecasting system is non-trivial, since we need to handle all kinds of locations at different administrative levels, which include pretty different epidemic-evolving patterns. Moreover, there are dynamic and volatile correlations of pandemic conditions among these locations, which further enlarge the difficulty in forecasting. With these challenges in mind, we develop a novel spatial-temporal forecasting framework. First, to accommodate all kinds of locations at different administrative levels, we propose a unified hierarchical view, which mimics the aggregation procedure of pandemic statistics. Then, this view motivates us to facilitate joint learning across administrative levels and inspires us to design the cross-level consistency loss as an extra regularization to stabilize model training. Besides, to capture those dynamic and volatile spatial correlations, we design a customized spatial module with adaptive edge gates, which can both reinforce effective messages and disable irrelevant ones. We put this framework into production to help the battle against COVID-19 in the United States. A comprehensive online evaluation across three months demonstrates that our projections are the most competitive ones among all results produced by dozens of international group and even surpass the official ensemble in many cases. We also visualize our unique edge gates to understand the evolvement of spatial correlations and present intuitive case studies. Besides, we open source our implementation at https://github.com/dolphin-zs/HierST to facilitate future research towards better epidemic modeling.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116140283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. W. Anelli, Yashar Deldjoo, Tommaso Di Noia, Felice Antonio Merra
{"title":"A Formal Analysis of Recommendation Quality of Adversarially-trained Recommenders","authors":"V. W. Anelli, Yashar Deldjoo, Tommaso Di Noia, Felice Antonio Merra","doi":"10.1145/3459637.3482046","DOIUrl":"https://doi.org/10.1145/3459637.3482046","url":null,"abstract":"Recommender systems (RSs) employ user-item feedback, e.g., ratings, to match customers to personalized lists of products. Approaches to top-k recommendation mainly rely on Learning-To-Rank algorithms and, among them, the most widely adopted is Bayesian Personalized Ranking (BPR), which bases on a pair-wise optimization approach. Recently, BPR has been found vulnerable against adversarial perturbations of its model parameters. Adversarial Personalized Ranking (APR) mitigates this issue by robustifying BPR via an adversarial training procedure. The empirical improvements of APR's accuracy performance on BPR have led to its wide use in several recommender models. However, a key overlooked aspect has been the beyond-accuracy performance of APR, i.e., novelty, coverage, and amplification of popularity bias, considering that recent results suggest that BPR, the building block of APR, is sensitive to the intensification of biases and reduction of recommendation novelty. In this work, we model the learning characteristics of the BPR and APR optimization frameworks to give mathematical evidence that, when the feedback data have a tailed distribution, APR amplifies the popularity bias more than BPR due to an unbalanced number of received positive updates from short-head items. Using matrix factorization (MF), we empirically validate the theoretical results by performing preliminary experiments on two public datasets to compare BPR-MF and APR-MF performance on accuracy and beyond-accuracy metrics. The experimental results consistently show the degradation of novelty and coverage measures and a worrying amplification of bias.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116591267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tabular Functional Block Detection with Embedding-based Agglomerative Cell Clustering","authors":"Kexuan Sun, Fei Wang, Muhao Chen, J. Pujara","doi":"10.1145/3459637.3482484","DOIUrl":"https://doi.org/10.1145/3459637.3482484","url":null,"abstract":"Tables are a widely-used format for data curation. The diversity of domains, layouts, and content of tables makes knowledge extraction challenging. Understanding table layouts is an important step for automatically harvesting knowledge from tabular data. Since table cells are spatially organized into regions, correctly identifying such regions and inferring their functional roles, referred to as functional block detection, is a critical part of understanding table layouts. Earlier functional block detection approaches fail to leverage spatial relationships and higher-level structure, either depending on cell-level predictions or relying on data types as signals for identifying blocks. In this paper, we introduce a flexible functional block detection method by applying agglomerative clustering techniques which merge smaller blocks into larger blocks using two merging strategies. Our proposed method uses cell embeddings with a customized dissimilarity function which utilizes local and margin distances, as well as block coherence metrics to capture cell, block, and table scoped features. Given the diversity of tables in real-world corpora, we also introduce a sampling-based approach for automatically tuning distance thresholds for each table. Experimental results show that our method improves over the earlier state-of-the-art method in terms of several evaluation metrics.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116617902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online Advertising Incrementality Testing: Practical Lessons And Emerging Challenges","authors":"Joel Barajas, Narayan L. Bhamidipati, J. Shanahan","doi":"10.1145/3459637.3482031","DOIUrl":"https://doi.org/10.1145/3459637.3482031","url":null,"abstract":"Online advertising has historically been approached as an ad-to-user matching problem within sophisticated optimization algorithms. As the research and ad-tech industries have progressed, advertisers have increasingly emphasized the causal effect estimation of their ads (incrementality) using controlled experiments (A/B testing). With low lift effects and sparse conversion, the development of incrementality testing platforms at scale suggests tremendous engineering challenges in measurement precision. Similarly, the correct interpretation of results addressing a business goal requires significant data science and experimentation research expertise. We propose a practical tutorial in the incrementality testing landscape, including: The business need; Literature solutions and industry practices; Designs in the development of testing platforms; The testing cycle, case studies, and recommendations. We provide first-hand lessons based on the development of such a platform in a major combined DSP and ad network, and after running several tests for up to two months each over recent years.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124876016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fair Graph Mining","authors":"Jian Kang, Hanghang Tong","doi":"10.1145/3459637.3482030","DOIUrl":"https://doi.org/10.1145/3459637.3482030","url":null,"abstract":"In today's increasingly connected world, graph mining plays a pivotal role in many real-world application domains, including social network analysis, recommendations, marketing and financial security. Tremendous efforts have been made to develop a wide range of computational models. However, recent studies have revealed that many widely-applied graph mining models could suffer from potential discrimination. Fairness on graph mining aims to develop strategies in order to mitigate bias introduced/amplified during the mining process. The unique challenges of enforcing fairness on graph mining include (1) theoretical challenge on non-IID nature of graph data, which may invalidate the basic assumption behind many existing studies in fair machine learning, and (2) algorithmic challenge on the dilemma of balancing model accuracy and fairness. This tutorial aims to (1) present a comprehensive review of state-of-the-art techniques in fairness on graph mining and (2) identify the open challenges and future trends. In particular, we start with reviewing the background, problem definitions, unique challenges and related problems; then we will focus on an in-depth overview of (1) recent techniques in enforcing group fairness, individual fairness and other fairness notions in the context of graph mining, and (2) future directions in studying algorithmic fairness on graphs. We believe this tutorial could be attractive to researchers and practitioners in areas including data mining, artificial intelligence, social science and beneficial to a plethora of real-world application domains.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124930884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}