Mingzhe Xing, Shuqing Bian, Wayne Xin Zhao, Zhen Xiao, Xinji Luo, Cunxiang Yin, Jing Cai, Yancheng He
{"title":"Learning Reliable User Representations from Volatile and Sparse Data to Accurately Predict Customer Lifetime Value","authors":"Mingzhe Xing, Shuqing Bian, Wayne Xin Zhao, Zhen Xiao, Xinji Luo, Cunxiang Yin, Jing Cai, Yancheng He","doi":"10.1145/3447548.3467079","DOIUrl":"https://doi.org/10.1145/3447548.3467079","url":null,"abstract":"In industry, customer lifetime value (LTV) prediction is a challenging task, since user consumption data is usually volatile, noisy, or sparse. To address these issues, this paper presents a novel Temporal-Structural User Representation (named TSUR) network to predict LTV. We utilize historical revenue time series and user attributes to learn both temporal and structural user representations, respectively. Specifically, the temporal representation is learned with a temporal trend encoder based on a novel multi-channel Discrete Wavelet Transform~(DWT) module, while the structural representation is derived with Graph Attention Network (GAT) on an attribute similarity graph. Furthermore, a novel cluster-alignment regularization method is employed to align and enhance these two kinds of representations. In essence, such a fusion way can be considered as the association of temporal and structural representations in the low-pass representation space, which is also useful to prevent the data noise from being transferred across different views. To our knowledge, it is the first time that temporal and structural user representations are jointly learned for LTV prediction. Extensive offline experiments on two large-scale real-world datasets and online A/B tests have shown the superiority of our approach over a number of competitive baselines.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122541470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TrajNet: A Trajectory-Based Deep Learning Model for Traffic Prediction","authors":"Bo Hui, Dan Yan, Haiquan Chen, Wei-Shinn Ku","doi":"10.1145/3447548.3467236","DOIUrl":"https://doi.org/10.1145/3447548.3467236","url":null,"abstract":"Ridesharing companies such as Ube and DiDi provide ride-hailing services where passengers and drivers are matched via mobile apps. As a result, large amounts of vehicle trajectories and vehicle speed data are collected that can be used for traffic prediction. The recent popularity of graph convolutional networks (GCNs) has opened up new possibilities for real-time traffic prediction and many GCN-based models have been proposed to capture the spatial correlation on the urban road network. However, the graph-based approaches fail to capture the intricate dependencies of consecutive road segments that are well captured by trajectories. Instead of proposing yet another GCN-based model for traffic prediction, we propose a novel deep learning model that treats vehicle trajectories as first-class citizens. Our model, called TrajNet, captures the spatial dependency of traffic flow by propagating information along real trajectories. To improve training efficiency, we organize the multiple trajectories in a batch used for training with a trie structure, to reuse shared computation. TrajNet uses a spatial attention mechanism to adaptively capture the dynamic correlations between different road segments, and dilated causal convolution to capture long-range temporal dependency. We also resolve the inconsistency between the fine-grained road segment coverage by trajectories, and the ground-truth traffic data that are coarse-grained, following a trajectory-based refinement framework. Extensive experiments on real traffic datasets validate the performance superiority of TrajNet over the state-of-the-art GCN-based models.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123902744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Third International Workshop on Smart Data for Blockchain and Distributed Ledger (SDBD2021): Joint Workshop with SIGKDD 2021 Trust Day","authors":"Feida Zhu, Jian Pei","doi":"10.1145/3447548.3469441","DOIUrl":"https://doi.org/10.1145/3447548.3469441","url":null,"abstract":"Today's computing is characterized by an increasing degree of complexity, comprehensiveness and collaboration. The complexity can be observed by the wide application of gigantic models with a huge number of parameters and structures of an unprecedented level of sophistication. The comprehensiveness is best illustrated by the high heterogeneity of data both in terms of format and source. The collaboration, finally, becomes an obvious trend when computing systems grow more open and decentralized in which various entities interact to achieve collective intelligence with the presence of potentially malicious behavior. Trust, therefore, has become critical at multiple levels: At model level to assure its integrity, fairness and interpretability; At data level to safeguard data quality, compliance and privacy; At system level to govern resilience, performance and incentive. Moreover, the notion of trust has long been discussed in different domains in both academia and industry with different definition and understanding. The Third International Workshop on Smart Data for Blockchain and Distributed Ledger (SDBD'21) will be held as a joint workshop with the special-themed \"Trust Day\" of KDD 2021, which has therefore aimed to bring together researchers, practitioners and experts from various communities to exchange and explore ideas, frontiers, opportunities and challenges under the broad theme of \"trust\" in a highly interdisciplinary manner.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121385397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haishuai Wang, Zhao Li, Peng Zhang, Jiaming Huang, Pengrui Hui, Jian Liao, Ji Zhang, Jiajun Bu
{"title":"Live-Streaming Fraud Detection: A Heterogeneous Graph Neural Network Approach","authors":"Haishuai Wang, Zhao Li, Peng Zhang, Jiaming Huang, Pengrui Hui, Jian Liao, Ji Zhang, Jiajun Bu","doi":"10.1145/3447548.3467065","DOIUrl":"https://doi.org/10.1145/3447548.3467065","url":null,"abstract":"Live-streaming platforms have recently gained significant popularity by attracting an increasing number of young users and have become a very promising form of online shopping. Similar to the traditional online shopping platforms such as Taobao, live-streaming platforms also suffer from online malicious fraudulent behaviors where many transactions are not genuine. The existing anti-fraud models proposed to recognize fraudulent transactions on traditional online shopping platforms are inapplicable on live-streaming platforms. This is mainly because live-streaming platforms are characterized by a unique type of heterogeneous live-streaming networks where multiple heterogeneous types of nodes such as users, live-streamers, and products are connected with multiple different types of edges associated with edge features. In this paper, we propose a new approach based on a heterogeneous graph neural network for LIve-streaming Fraud dEtection (called LIFE). LIFE designs an innovative heterogeneous graph learning model that fully utilizes various heterogeneous information of shopping transactions, users, streamers, and items from a given live-streaming platform. Moreover, a label propagation algorithm is employed within our LIFE framework to handle the limited number of labeled fraudulent transactions for model training. Extensive experimental results on a large-scale Taobao live-streaming platform demonstrate that the proposed method is superior to the baseline models in terms of fraud detection effectiveness on live-streaming platforms. Furthermore, we conduct a case study to show that the proposed method is able to effectively detect fraud communities for live-streaming e-commerce platforms.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128827832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xun Zhou, Liang Zhao, Zhe Jiang, R. Stewart, S. Shekhar, Jieping Ye
{"title":"DeepSpatial'21: 2nd International Workshop on Deep Learning for Spatiotemporal Data, Applications, and Systems","authors":"Xun Zhou, Liang Zhao, Zhe Jiang, R. Stewart, S. Shekhar, Jieping Ye","doi":"10.1145/3447548.3469446","DOIUrl":"https://doi.org/10.1145/3447548.3469446","url":null,"abstract":"With the advancement of GPS and remote sensing technologies and the pervasiveness of smartphones and mobile devices, large amounts of spatiotemporal data are being collected from various domains. Knowledge discovery from spatiotemporal data is crucial in broad societal applications. Examples range from mapping flooded areas on satellite imagery for disaster response to monitoring crop health for food security, from estimating travel time between locations on Google Maps to forecasting hotspots of diseases like Covid-19 in public health. The recent success in deep learning technologies in computer vision and natural language processing provides unique opportunities for spatiotemporal data mining (e.g., automatically extracting spatial contextual features without manual feature engineering) but also faces unique challenges (e.g., spatial autocorrelation, heterogeneity, multiple scales, and resolutions, the existence of domain knowledge and constraints). This workshop provides a premium platform for researchers from both academia and industry to exchange ideas on opportunities, challenges, and cutting-edge techniques of deep learning for spatiotemporal data. We hope to inspire novel ideas and visions through the workshop and facilitate the development of this emerging research area.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115860479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Farhan Asif Chowdhury, M. A. Siddiquee, G. Baker, A. Mueen
{"title":"FASER: Seismic Phase Identifier for Automated Monitoring","authors":"Farhan Asif Chowdhury, M. A. Siddiquee, G. Baker, A. Mueen","doi":"10.1145/3447548.3467064","DOIUrl":"https://doi.org/10.1145/3447548.3467064","url":null,"abstract":"Seismic phase identification classifies the type of seismic wave received at a station based on the waveform (i.e., time series) recorded by a seismometer. Automated phase identification is an integrated component of large scale seismic monitoring applications, including earthquake warning systems and underground explosion monitoring. Accurate, fast, and fine-grained phase identification is instrumental for earthquake location estimation, understanding Earth's crustal and mantle structure for predictive modeling, etc. However, existing operational systems utilize multiple nearby stations for precise identification, which delays response time with added complexity and manual interventions. Moreover, single-station systems mostly perform coarse phase identification. In this paper, we revisit the seismic phase classification as an integrated part of a seismic processing pipeline. We develop a machine-learned model FASER, that takes input from a signal detector and produces phase types as output for a signal associator. The model is a combination of convolutional and long short-term memory networks. Our method identifies finer wave types, including crustal and mantle phases. We conduct comprehensive experiments on real datasets to show that FASER outperforms existing baselines. We evaluate FASER holding out sources and stations across the world to demonstrate consistent performance for novel sources and stations.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114239989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning to Walk across Time for Interpretable Temporal Knowledge Graph Completion","authors":"Jaehun Jung, Jinhong Jung, U. Kang","doi":"10.1145/3447548.3467292","DOIUrl":"https://doi.org/10.1145/3447548.3467292","url":null,"abstract":"Static knowledge graphs (KGs), despite their wide usage in relational reasoning and downstream tasks, fall short of realistic modeling of knowledge and facts that are only temporarily valid. Compared to static knowledge graphs, temporal knowledge graphs (TKGs) inherently reflect the transient nature of real-world knowledge. Naturally, automatic TKG completion has drawn much research interests for a more realistic modeling of relational reasoning. However, most of the existing models for TKG completion extend static KG embeddings that do not fully exploit TKG structure, thus lacking in 1) accounting for temporally relevant events already residing in the local neighborhood of a query, and 2) path-based inference that facilitates multi-hop reasoning and better interpretability. In this paper, we propose T-GAP, a novel model for TKG completion that maximally utilizes both temporal information and graph structure in its encoder and decoder. T-GAP encodes query-specific substructure of TKG by focusing on the temporal displacement between each event and the query timestamp, and performs path-based inference by propagating attention through the graph. Our empirical experiments demonstrate that T-GAP not only achieves superior performance against state-of-the-art baselines, but also competently generalizes to queries with unseen timestamps. Through extensive qualitative analyses, we also show that T-GAP enjoys transparent interpretability, and follows human intuition in its reasoning process.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114425741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Johannes Huegle, C. Hagedorn, M. Perscheid, H. Plattner
{"title":"MPCSL - A Modular Pipeline for Causal Structure Learning","authors":"Johannes Huegle, C. Hagedorn, M. Perscheid, H. Plattner","doi":"10.1145/3447548.3467082","DOIUrl":"https://doi.org/10.1145/3447548.3467082","url":null,"abstract":"The examination of causal structures is crucial for data scientists in a variety of machine learning application scenarios. In recent years, the corresponding interest in methods of causal structure learning has led to a wide spectrum of independent implementations, each having specific accuracy characteristics and introducing implementation-specific overhead in the runtime. Hence, considering a selection of algorithms or different implementations in different programming languages utilizing different hardware setups becomes a tedious manual task with high setup costs. Consequently, a tool that enables to plug in existing methods from different libraries into a single system to compare and evaluate the results is substantial support for data scientists in their research efforts. In this work, we propose an architectural blueprint of a pipeline for causal structure learning and outline our reference implementation MPCSL that addresses the requirements towards platform independence and modularity while ensuring the comparability and reproducibility of experiments. Moreover, we demonstrate the capabilities of MPCSL within a case study, where we evaluate existing implementations of the well-known PC-Algorithm concerning their runtime performance characteristics.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121023277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bingyu Liu, Yuhong Guo, Jianan Jiang, Jian-Bo Tang, Weihong Deng
{"title":"Multi-view Correlation based Black-box Adversarial Attack for 3D Object Detection","authors":"Bingyu Liu, Yuhong Guo, Jianan Jiang, Jian-Bo Tang, Weihong Deng","doi":"10.1145/3447548.3467432","DOIUrl":"https://doi.org/10.1145/3447548.3467432","url":null,"abstract":"Deep neural networks have made tremendous progress in 3D object detection, which is an important task especially in autonomous driving scenarios. Benefited from the breakthroughs in deep learning and sensor technologies, 3D object detection methods based on different sensors, such as camera and LiDAR, have developed rapidly. Meanwhile, more and more researches notice that the abundant information contained in the multi-view data can be used to obtain more accurate understanding of the 3D surrounding environment. Therefore, many sensor-fusion 3D object detection methods have been proposed. As safety is critical in autonomous driving and the deep neural networks are known to be vulnerable to adversarial examples with visually imperceptible perturbations, it is significant to investigate adversarial attacks for 3D object detection. Recent works have shown that both image-based and LiDAR-based networks can be attacked by the adversarial examples while the attacks to the sensor-fusion models, which tend to be more robust, haven't been studied. To this end, we propose a simple multi-view correlation based adversarial attack method for the camera-LiDAR fusion 3D object detection models and focus on the black-box attack setting which is more practical in real-world systems. Specifically, we first design a generative network to generate image adversarial examples based on an auxiliary image semantic segmentation network. Then, we develop a cross-view perturbation projection method by exploiting the camera-LiDAR correlations to map each image adversarial example to the space of the point cloud data to form the point cloud adversarial examples in the LiDAR view. Extensive experiments on the KITTI dataset demonstrate the effectiveness of the proposed method.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127334675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao Wang, C. Liu, Zipeng Dai, Jian Tang, Guoren Wang
{"title":"Energy-Efficient 3D Vehicular Crowdsourcing for Disaster Response by Distributed Deep Reinforcement Learning","authors":"Hao Wang, C. Liu, Zipeng Dai, Jian Tang, Guoren Wang","doi":"10.1145/3447548.3467070","DOIUrl":"https://doi.org/10.1145/3447548.3467070","url":null,"abstract":"Fast and efficient access to environmental and life data is key to the successful disaster response. Vehicular crowdsourcing (VC) by a group of unmanned vehicles (UVs) like drones and unmanned ground vehicles to collect these data from Point-of-Interests (PoIs) e.g., possible survivor spots and fire site, provides an efficient way to assist disaster rescue. In this paper, we explicitly consider to navigate a group of UVs in a 3-dimensional (3D) disaster workzone to maximize the amount of collected data, geographical fairness, energy efficiency, while minimizing data dropout due to limited transmission rate. We propose DRL-DisasterVC(3D), a distributed deep reinforcement learning framework, with a repetitive experience replay (RER) to improve learning efficiency, and a clipped target network to increase learning stability. We also use a 3D convolutional neural network (3D CNN) with multi-head-relational attention (MHRA) for spatial modeling, and add auxiliary pixel control (PC) for spatial exploration. We designed a novel disaster response simulator, called \"DisasterSim\", and conduct extensive experiments to show that DRL-DisasterVC(3D) outperforms all five baselines in terms of energy efficiency when varying the numbers of UVs, PoIs and SNR threshold.","PeriodicalId":421090,"journal":{"name":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125079865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}