Md. Parvez Mollah, Biplob K. Debnath, Murugan Sankaradas, S. Chakradhar, A. Mueen
{"title":"Efficient Compression Method for Roadside LiDAR Data","authors":"Md. Parvez Mollah, Biplob K. Debnath, Murugan Sankaradas, S. Chakradhar, A. Mueen","doi":"10.1145/3511808.3557144","DOIUrl":"https://doi.org/10.1145/3511808.3557144","url":null,"abstract":"Roadside LiDAR (Light Detection and Ranging) sensors are recently being explored for intelligent transportation systems aiming at safer and faster traffic management and vehicular operations. A key challenge in such systems is to efficiently transfer massive point-cloud data from the roadside LiDAR devices to the edge connected through a 5G network for real-time processing. In this paper, we consider the problem of compressing roadside (i.e. static) LiDAR data in real-time that provides a unique condition unexplored by current methods. Existing point-cloud compression methods assume moving LiDARs (that are mounted on vehicles) and do not exploit spatial consistency across frames over time. To this end, we develop a novel grouped wavelet technique for static roadside LiDAR data compression (i.e. SLiC). Our method compresses LiDAR data both spatially and temporally using a kd-tree data structure based on Haar wavelet coefficients. Experimental results show that SLiC can compress up to 1.9× more effectively than the state-of-the-art compression method can do. Moreover, SLiC is computationally more efficient to achieve 2× improvement in bandwidth usage over the best alternative. Even with this impressive gain in communication and storage efficiency, SLiC retains down-the-pipeline application's accuracy.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131596376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hang Zhang, Hao Wang, Guifeng Wang, Jia-Yin Liu, Qi Liu
{"title":"A Hyperbolic-to-Hyperbolic User Representation with Multi-aspect for Social Recommendation","authors":"Hang Zhang, Hao Wang, Guifeng Wang, Jia-Yin Liu, Qi Liu","doi":"10.1145/3511808.3557532","DOIUrl":"https://doi.org/10.1145/3511808.3557532","url":null,"abstract":"Social recommender systems play a key role in solving the problem of information overload. In order to better extract latent hierarchical property in the data, they usually explore the user-user connections and user-item interactions in hyperbolic space. Existing methods resort tangent spaces to realize some operations (e.g., matrix multiplication) on hyperbolic manifolds. However, frequently projecting between the hyperbolic space and the tangent space will destroy the global structure of the manifold and reduce the accuracy of predictions. Besides, decisions made by users are often influenced by multi-aspect potential preferences, which are usually represented as a vector for each user. To this end, we design a novel hyperbolic-to-hyperbolic user representation with multi-aspect social recommender system, namely H2HMSR, which directly works in hyperbolic space. Extensive experiments on three public datasets demonstrate that our model can adequately extract social information of users with multi-aspect preferences and outperforms hyperbolic and Euclidean counterparts.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132553614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Curriculum Contrastive Learning for Fake News Detection","authors":"Jiachen Ma, Yong Liu, Meng Liu, Meng Han","doi":"10.1145/3511808.3557574","DOIUrl":"https://doi.org/10.1145/3511808.3557574","url":null,"abstract":"Due to the rapid spread of fake news on social media, society and economy have been negatively affected in many ways. How to effectively identify fake news is a challenging problem that has received great attention from academic and industry. Existing deep learning methods for fake news detection require a large amount of labeled data to train the model, but obtaining labeled data is a time-consuming and labor-intensive process. To extract useful information from a large amount of unlabeled data, some contrastive learning methods for fake news detection are proposed. However, existing contrastive learning methods only randomly sample negative samples at different training stages, resulting in the role of negative samples not being fully played. Intuitively, increasing the contrastive difficulty of negative samples gradually in a way similar to human learning will contribute to improve the performance of the model. Inspired by the idea of curriculum learning, we propose a curriculum contrastive model (CCFD) for fake news detection which automatically select and train negative samples with different difficulty at different training stages. Furthermore, we also propose three new augmentation methods which consider the importance of edges and node attributes in the propagation structure to obtain more effective positive samples. The experimental results on three public datasets show that our model CCFD outperforms the existing state-of-the-art models for fake news detection.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132566586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Context-Enhanced Generate-then-Evaluate Framework for Chinese Abbreviation Prediction","authors":"Hanwen Tong, Chenhao Xie, Jiaqing Liang, Qi He, Zhiang Yue, Jingping Liu, Yanghua Xiao, Wenguang Wang","doi":"10.1145/3511808.3557219","DOIUrl":"https://doi.org/10.1145/3511808.3557219","url":null,"abstract":"As a popular form of lexicalization, abbreviation is widely used in both oral and written language and plays an important role in various Natural Language Processing applications. However, current approaches cannot ensure that the predicted abbreviation preserves the meaning of its full form and maintains fluency. In this paper, we introduce a fresh perspective to evaluate the quality of abbreviations within their textual contexts with pre-trained language model. To this end, we propose a novel two-stage generate-then-evaluate framework enhanced by context, which consists of a generation model to generate multiple candidate abbreviations and an evaluation model to evaluate their quality within their contexts. Experimental results show that our framework consistently outperforms all the existing approaches, achieving 53.2% Hit@1 performance with a 5.6 points improvement compared to its previous best result. Our code and data are publicly available at https://github.com/HavenTong/CEGE.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132593017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Aspect Embedding of Dynamic Graphs","authors":"Aimin Sun, Zhiguo Gong","doi":"10.1145/3511808.3557650","DOIUrl":"https://doi.org/10.1145/3511808.3557650","url":null,"abstract":"Graph embedding is regarded as one of the most advanced techniques for graph data analyses due to its significant performance. However, the majority of existing works only focus on static graphs while ignoring the ubiquitous dynamic graphs. In fact, the temporal evolution of edges in a dynamic graph sets a harsh challenge for the traditional embedding algorithms. To solve the problem, in this paper we propose a Dynamic Graph Multi-Aspect Embedding (DGMAE) to automatically learn the proper number of aspects and their distributions in each temporal duration based on a distance dependent Chinese Restaurant Process. The proposed method can encode the inherent property of varying interactions among nodes along the time and present different aspect-influences to nodes embedding. Our extensive experiments on several public datasets show the performance improvement over state-of-the-art works.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132790108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chi Zhang, Yantong Du, Xiangyu Zhao, Qilong Han, R. Chen, Li Li
{"title":"Hierarchical Item Inconsistency Signal Learning for Sequence Denoising in Sequential Recommendation","authors":"Chi Zhang, Yantong Du, Xiangyu Zhao, Qilong Han, R. Chen, Li Li","doi":"10.1145/3511808.3557348","DOIUrl":"https://doi.org/10.1145/3511808.3557348","url":null,"abstract":"Sequential recommender systems aim to recommend the next items in which target users are most interested based on their historical interaction sequences. In practice, historical sequences typically contain some inherent noise (e.g., accidental interactions), which is harmful to learn accurate sequence representations and thus misleads the next-item recommendation. However, the absence of supervised signals (i.e., labels indicating noisy items) makes the problem of sequence denoising rather challenging. To this end, we propose a novel sequence denoising paradigm for sequential recommendation by learning hierarchical item inconsistency signals. More specifically, we design a hierarchical sequence denoising (HSD) model, which first learns two levels of inconsistency signals in input sequences, and then generates noiseless subsequences (i.e., dropping inherent noisy items) for subsequent sequential recommenders. It is noteworthy that HSD is flexible to accommodate supervised item signals, if any, and can be seamlessly integrated with most existing sequential recommendation models to boost their performance. Extensive experiments on five public benchmark datasets demonstrate the superiority of HSD over state-of-the-art denoising methods and its applicability over a wide variety of mainstream sequential recommendation models. The implementation code is available at https://github.com/zc-97/HSD","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132913745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simulating Complex Problems Inside a Database","authors":"G. Fissore, N. Vasiloglou","doi":"10.1145/3511808.3557520","DOIUrl":"https://doi.org/10.1145/3511808.3557520","url":null,"abstract":"The standard way to store and interact with the large amount of data that are central to the functioning of any modern business is through the use of a relational Knowledge Graph Management System (KGMS). In this paper we show how the relational model can be successfully exploited to model complex analytic scenarios while enjoying the same characteristics of clarity and flexibility as when modeling the data themselves. Using the Rel language, we simulate the daily schedule of an airline company as an agentbased system, and we will show how modeling this system through a set of relationships and logical rules will let us focus directly on the inherent complexity of our model, taking away most of the incidental effort in actually implementing our simulation.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133639294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sebastian Baunsgaard, Matthias Boehm, Kevin Innerebner, Mito Kehayov, F. Lackner, Olga Ovcharenko, Arnab Phani, Tobias Rieger, David Weissteiner, Sebastian Benjamin Wrede
{"title":"Federated Data Preparation, Learning, and Debugging in Apache SystemDS","authors":"Sebastian Baunsgaard, Matthias Boehm, Kevin Innerebner, Mito Kehayov, F. Lackner, Olga Ovcharenko, Arnab Phani, Tobias Rieger, David Weissteiner, Sebastian Benjamin Wrede","doi":"10.1145/3511808.3557162","DOIUrl":"https://doi.org/10.1145/3511808.3557162","url":null,"abstract":"Federated learning allows training machine learning (ML) models without central consolidation of the raw data. Variants of such federated learning systems enable privacy-preserving ML, and address data ownership and/or sharing constraints. However, existing work mostly adopt data-parallel parameter-server architectures for mini-batch training, require manual construction of federated runtime plans, and largely ignore the broad variety of data preparation, ML algorithms, and model debugging. Over the last years, we extended Apache SystemDS by an additional federated runtime backend for federated linear-algebra programs, federated parameter servers, and federated data preparation. In this paper, we share the system-level compiler and runtime integration, new features such as multi-tenant federated learning, selected federated primitives, multi-key homomorphic encryption, and our monitoring infrastructure. Our demonstrator showcases how composite ML pipelines can be compiled into federated runtime plans with low overhead.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133652860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yucheng Lu, Qiang Ji, Liang Wang, Tianshu Wu, Hongbo Deng, Jian Xu, Bo Zheng
{"title":"STARDOM: Semantic Aware Deep Hierarchical Forecasting Model for Search Traffic Prediction","authors":"Yucheng Lu, Qiang Ji, Liang Wang, Tianshu Wu, Hongbo Deng, Jian Xu, Bo Zheng","doi":"10.1145/3511808.3557102","DOIUrl":"https://doi.org/10.1145/3511808.3557102","url":null,"abstract":"We study the search traffic forecasting problem for guaranteed search advertising (GSA) application in e-commerce platforms. The consumers express their purchase intents by posing queries to the e-commerce search engine. GSA is a type of guaranteed delivery (GD) advertising strategy, which forecasts the traffic of search queries, and charges the advertisers according to the predicted volumes of search queries the advertisers willing to buy. We employ the time series forecasting method to make the search traffic prediction. Different from existing time series prediction methods, search queries are semantically meaningful, with semantically similar queries possessing similar time series. And they can be grouped according to the brands or categories they belong to, exhibiting hierarchical structures. To fully take advantage of these characteristics, we design a SemanTic AwaRe Deep hierarchical fOrecasting Model (STARDOM for short) which explores the queries' semantic information and the hierarchical structures formed by the queries. Specifically, to exploit hierarchical structure, we propose a reconciliation learning module. It leverages deep learning model to learn the reconciliation relation between the hierarchical series in the latent space automatically, and forces the coherence constraints through a distill reconciliation loss. To exploit semantic information, we propose a semantic representation module and generate semantic aware series embeddings for queries. Extensive experiments are conducted to confirm the effectiveness of the proposed method.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"95 3-4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132194035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cognitive Diagnosis Focusing on Knowledge Concepts","authors":"Sheng Li, Quanlong Guan, Liangda Fang, Fang Xiao, Zhenyu He, Yizhou He, Weiqi Luo","doi":"10.1145/3511808.3557096","DOIUrl":"https://doi.org/10.1145/3511808.3557096","url":null,"abstract":"Cognitive diagnosis is a crucial task in the field of educational measurement and psychology, which aims to diagnose the strengths and weaknesses of participants. Existing cognitive diagnosis methods only consider which of knowledge concepts are involved in the knowledge components of exercises, but ignore the fact that different knowledge concepts have different effects on practice scores in actual learning situations. Therefore, researchers need to reshape the learning scene by combining the multi-factor relationships between knowledge components. In this paper, in order to more comprehensively simulate the interaction between students and exercises, we developed a neural network-based CDMFKC model for cognitive diagnosis. Our method not only captures the nonlinear interaction between exercise characteristics, student performance, and their mastery of each knowledge concept, but also further considers the impact of knowledge concepts by designing the difficulty and discrimination of knowledge concepts, and uses multiple neural layers to model their interaction so as to obtain accurate and interpretable diagnostic results. In addition, we propose an improved CDMFKC model with guessing parameter and slipping parameter designed by knowledge concept proficiency and student proficiency vectors. We validate the performance of these two diagnostic models on six real datasets. The experimental results show that the two models have better effects in the aspects of accuracy, rationality and interpretability.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"281 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134461433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}