{"title":"BhBF: A Bloom Filter Using Bh Sequences for Multi-set Membership Query","authors":"Shuyu Pei, Kun Xie, Xin Wang, Gaogang Xie, Kenli Li, Wei Li, Yanbiao Li, Jigang Wen","doi":"10.1145/3502735","DOIUrl":"https://doi.org/10.1145/3502735","url":null,"abstract":"Multi-set membership query is a fundamental issue for network functions such as packet processing and state machines monitoring. Given the rigid query speed and memory requirements, it would be promising if a multi-set query algorithm can be designed based on Bloom filter (BF), a space-efficient probabilistic data structure. However, existing efforts on multi-set query based on BF suffer from at least one of the following drawbacks: low query speed, low query accuracy, limitation in only supporting insertion and query operations, or limitation in the set size. To address the issues, we design a novel Bh sequence-based Bloom filter (BhBF) for multi-set query, which supports four operations: insertion, query, deletion, and update. In BhBF, the set ID is encoded as a code in a Bh sequence. Exploiting good properties of Bh sequences, we can correctly decode the BF cells to obtain the set IDs even when the number of hash collisions is high, which brings high query accuracy. In BhBF, we propose two strategies to further speed up the query speed and increase the query accuracy. On the theoretical side, we analyze the false positive and classification failure rate of our BhBF. Our results from extensive experiments over two real datasets demonstrate that BhBF significantly advances state-of-the-art multi-set query algorithms.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116270873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shikha Singh, É. Chouzenoux, G. Chierchia, A. Majumdar
{"title":"Multi-label Deep Convolutional Transform Learning for Non-intrusive Load Monitoring","authors":"Shikha Singh, É. Chouzenoux, G. Chierchia, A. Majumdar","doi":"10.1145/3502729","DOIUrl":"https://doi.org/10.1145/3502729","url":null,"abstract":"The objective of this letter is to propose a novel computational method to learn the state of an appliance (ON / OFF) given the aggregate power consumption recorded by the smart-meter. We formulate a multi-label classification problem where the classes correspond to the appliances. The proposed approach is based on our recently introduced framework of convolutional transform learning. We propose a deep supervised version of it relying on an original multi-label cost. Comparisons with state-of-the-art techniques show that our proposed method improves over the benchmarks on popular non-intrusive load monitoring datasets.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"83 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124465938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MBN: Towards Multi-Behavior Sequence Modeling for Next Basket Recommendation","authors":"Yanyan Shen, Baoyuan Ou, Ranzhen Li","doi":"10.1145/3497748","DOIUrl":"https://doi.org/10.1145/3497748","url":null,"abstract":"Next basket recommendation aims at predicting the next set of items that a user would likely purchase together, which plays an important role in e-commerce platforms. Unlike conventional item recommendation, the next basket recommendation focuses on capturing item correlations among baskets and learning the user’s temporal interest from the past purchasing basket sequence. In practice, most users interact with items in various kinds of behaviors. The multi-behavior data sheds light on user’s potential purchasing intention and resolves noisy signals from accidentally purchased items. In this article, we conduct an empirical study on real datasets to exploit the characteristics of multi-behavior data and confirm its positive effects on next basket recommendation. We develop a novel Multi-Behavior Network (MBN) model that captures item correlations and acquires meta-knowledge from multi-behavior basket sequences effectively. MBN employs the meta multi-behavior sequence encoder to model temporal dependencies of each individual behavior and extract meta-knowledge across different behaviors. Furthermore, we design the recurring-item-aware predictor in MBN to realize the high degree of the repeated occurrences of items, leading to better recommendation performance. We conduct extensive experiments to evaluate the performance of our proposed MBN model using real-world multi-behavior data. The results demonstrate the superior recommendation performance of MBN compared with various state-of-the-art methods.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126231120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph-Enhanced Spatial-Temporal Network for Next POI Recommendation","authors":"Zhaobo Wang, Yanmin Zhu, Qiaomei Zhang, Haobing Liu, Chunyang Wang, Tong Liu","doi":"10.1145/3513092","DOIUrl":"https://doi.org/10.1145/3513092","url":null,"abstract":"The task of next Point-of-Interest (POI) recommendation aims at recommending a list of POIs for a user to visit at the next timestamp based on his/her previous interactions, which is valuable for both location-based service providers and users. Recent state-of-the-art studies mainly employ recurrent neural network (RNN) based methods to model user check-in behaviors according to user’s historical check-in sequences. However, most of the existing RNN-based methods merely capture geographical influences depending on physical distance or successive relation among POIs. They are insufficient to capture the high-order complex geographical influences among POI networks, which are essential for estimating user preferences. To address this limitation, we propose a novel Graph-based Spatial Dependency modeling (GSD) module, which focuses on explicitly modeling complex geographical influences by leveraging graph embedding. GSD captures two types of geographical influences, i.e., distance-based and transition-based influences from designed POI semantic graphs. Additionally, we propose a novel Graph-enhanced Spatial-Temporal network (GSTN), which incorporates user spatial and temporal dependencies for next POI recommendation. Specifically, GSTN consists of a Long Short-Term Memory (LSTM) network for user-specific temporal dependencies modeling and GSD for user spatial dependencies learning. Finally, we evaluate the proposed model using three real-world datasets. Extensive experiments demonstrate the effectiveness of GSD in capturing various geographical influences and the improvement of GSTN over state-of-the-art methods.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128620282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Distance Function Optimization for the Near Neighbors-Based Classifiers","authors":"M. Jiřina, Said Krayem","doi":"10.1145/3434769","DOIUrl":"https://doi.org/10.1145/3434769","url":null,"abstract":"Based on the analysis of conditions for a good distance function we found four rules that should be fulfilled. Then, we introduce two new distance functions, a metric and a pseudometric one. We have tested how they fit for distance-based classifiers, especially for the IINC classifier. We rank distance functions according to several criteria and tests. Rankings depend not only on criteria or nature of the statistical test, but also whether it takes into account different difficulties of tasks or whether it considers all tasks as equally difficult. We have found that the new distance functions introduced belong among the four or five best out of 23 distance functions. We have tested them on 24 different tasks, using the mean, the median, the Friedman aligned test, and the Quade test. Our results show that a suitable distance function can improve behavior of distance-based classification rules.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123989560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Asymmetric Multi-Task Learning with Local Transference","authors":"Saullo H. G. Oliveira, A. Gonçalves, F. von Zuben","doi":"10.1145/3514252","DOIUrl":"https://doi.org/10.1145/3514252","url":null,"abstract":"In this article, we present the Group Asymmetric Multi-Task Learning (GAMTL) algorithm that automatically learns from data how tasks transfer information among themselves at the level of a subset of features. In practice, for each group of features GAMTL extracts an asymmetric relationship supported by the tasks, instead of assuming a single structure for all features. The additional flexibility promoted by local transference in GAMTL allows any two tasks to have multiple asymmetric relationships. The proposed method leverages the information present in these multiple structures to bias the training of individual tasks towards more generalizable models. The solution to the GAMTL’s associated optimization problem is an alternating minimization procedure involving tasks parameters and multiple asymmetric relationships, thus guiding to convex smaller sub-problems. GAMTL was evaluated on both synthetic and real datasets. To evidence GAMTL versatility, we generated a synthetic scenario characterized by diverse profiles of structural relationships among tasks. GAMTL was also applied to the problem of Alzheimer’s Disease (AD) progression prediction. Our experiments indicated that the proposed approach not only increased prediction performance, but also estimated scientifically grounded relationships among multiple cognitive scores, taken here as multiple regression tasks, and regions of interest in the brain, directly associated here with groups of features. We also employed stability selection analysis to investigate GAMTL’s robustness to data sampling rate and hyper-parameter configuration. GAMTL source code is available on GitHub: https://github.com/shgo/gamtl.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124000502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph Neural News Recommendation with User Existing and Potential Interest Modeling","authors":"Zhaopeng Qiu, Yunfan Hu, Xian Wu","doi":"10.1145/3511708","DOIUrl":"https://doi.org/10.1145/3511708","url":null,"abstract":"Personalized news recommendations can alleviate the information overload problem. To enable personalized recommendation, one critical step is to learn a comprehensive user representation to model her/his interests. Many existing works learn user representations from the historical clicked news articles, which reflect their existing interests. However, these approaches ignore users’ potential interests and pay less attention to news that may interest the users in the future. To address this problem, we propose a novel Graph neural news Recommendation model with user Existing and Potential interest modeling, named GREP. Different from existing works, GREP introduces three modules to jointly model users’ existing and potential interests: (1) Existing Interest Encoding module mines user historical clicked news and applies the multi-head self-attention mechanism to capture the relatedness among the news; (2) Potential Interest Encoding module leverages the graph neural network to explore the user potential interests on the knowledge graph; and (3) Bi-directional Interaction module dynamically builds a news-entity bipartite graph to further enrich two interest representations. Finally, GREP combines the existing and potential interest representations to represent the user and leverages a prediction layer to estimate the clicking probability of the candidate news. Experiments on two real-world large-scale datasets demonstrate the state-of-the-art performance of GREP.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"113 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132364225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quality-Informed Process Mining: A Case for Standardised Data Quality Annotations","authors":"Kanika Goel, S. Leemans, Niels Martin, M. Wynn","doi":"10.1145/3511707","DOIUrl":"https://doi.org/10.1145/3511707","url":null,"abstract":"Real-life event logs, reflecting the actual executions of complex business processes, are faced with numerous data quality issues. Extensive data sanity checks and pre-processing are usually needed before historical data can be used as input to obtain reliable data-driven insights. However, most of the existing algorithms in process mining, a field focusing on data-driven process analysis, do not take any data quality issues or the potential effects of data pre-processing into account explicitly. This can result in erroneous process mining results, leading to inaccurate, or misleading conclusions about the process under investigation. To address this gap, we propose data quality annotations for event logs, which can be used by process mining algorithms to generate quality-informed insights. Using a design science approach, requirements are formulated, which are leveraged to propose data quality annotations. Moreover, we present the “Quality-Informed visual Miner” plug-in to demonstrate the potential utility and impact of data quality annotations. Our experimental results, utilising both synthetic and real-life event logs, show how the use of data quality annotations by process mining techniques can assist in increasing the reliability of performance analysis results.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128250610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-relation Graph Summarization","authors":"Xiangyu Ke, Arijit Khan, F. Bonchi","doi":"10.1145/3494561","DOIUrl":"https://doi.org/10.1145/3494561","url":null,"abstract":"Graph summarization is beneficial in a wide range of applications, such as visualization, interactive and exploratory analysis, approximate query processing, reducing the on-disk storage footprint, and graph processing in modern hardware. However, the bulk of the literature on graph summarization surprisingly overlooks the possibility of having edges of different types. In this article, we study the novel problem of producing summaries of multi-relation networks, i.e., graphs where multiple edges of different types may exist between any pair of nodes. Multi-relation graphs are an expressive model of real-world activities, in which a relation can be a topic in social networks, an interaction type in genetic networks, or a snapshot in temporal graphs. The first approach that we consider for multi-relation graph summarization is a two-step method based on summarizing each relation in isolation, and then aggregating the resulting summaries in some clever way to produce a final unique summary. In doing this, as a side contribution, we provide the first polynomial-time approximation algorithm based on the k-Median clustering for the classic problem of lossless single-relation graph summarization. Then, we demonstrate the shortcomings of these two-step methods, and propose holistic approaches, both approximate and heuristic algorithms, to compute a summary directly for multi-relation graphs. In particular, we prove that the approximation bound of k-Median clustering for the single relation solution can be maintained in a multi-relation graph with proper aggregation operation over adjacency matrices corresponding to its multiple relations. Experimental results and case studies (on co-authorship networks and brain networks) validate the effectiveness and efficiency of the proposed algorithms.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115741918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Sowah, B. Kuditchar, Godfrey A. Mills, A. Acakpovi, Ralph A. Twum, Gifty Buah, R. Agboyi
{"title":"HCBST: An Efficient Hybrid Sampling Technique for Class Imbalance Problems","authors":"R. Sowah, B. Kuditchar, Godfrey A. Mills, A. Acakpovi, Ralph A. Twum, Gifty Buah, R. Agboyi","doi":"10.1145/3488280","DOIUrl":"https://doi.org/10.1145/3488280","url":null,"abstract":"Class imbalance problem is prevalent in many real-world domains. It has become an active area of research. In binary classification problems, imbalance learning refers to learning from a dataset with a high degree of skewness to the negative class. This phenomenon causes classification algorithms to perform woefully when predicting positive classes with new examples. Data resampling, which involves manipulating the training data before applying standard classification techniques, is among the most commonly used techniques to deal with the class imbalance problem. This article presents a new hybrid sampling technique that improves the overall performance of classification algorithms for solving the class imbalance problem significantly. The proposed method called the Hybrid Cluster-Based Undersampling Technique (HCBST) uses a combination of the cluster undersampling technique to under-sample the majority instances and an oversampling technique derived from Sigma Nearest Oversampling based on Convex Combination, to oversample the minority instances to solve the class imbalance problem with a high degree of accuracy and reliability. The performance of the proposed algorithm was tested using 11 datasets from the National Aeronautics and Space Administration Metric Data Program data repository and University of California Irvine Machine Learning data repository with varying degrees of imbalance. Results were compared with classification algorithms such as the K-nearest neighbours, support vector machines, decision tree, random forest, neural network, AdaBoost, naïve Bayes, and quadratic discriminant analysis. Tests results revealed that for the same datasets, the HCBST performed better with average performances of 0.73, 0.67, and 0.35 in terms of performance measures of area under curve, geometric mean, and Matthews Correlation Coefficient, respectively, across all the classifiers used for this study. The HCBST has the potential of improving the performance of the class imbalance problem, which by extension, will improve on the various applications that rely on the concept for a solution.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"251 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116721133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}