Vishnu Manasa Devagiri, V. Boeva, Shahrooz Abghari
{"title":"Domain Adaptation Through Cluster Integration and Correlation","authors":"Vishnu Manasa Devagiri, V. Boeva, Shahrooz Abghari","doi":"10.1109/ICDMW58026.2022.00025","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00025","url":null,"abstract":"Domain shift is a common problem in many real-world applications using machine learning models. Most of the existing solutions are based on supervised and deep-learning models. This paper proposes a novel clustering algorithm capable of producing an adapted and/or integrated clustering model for the considered domains. Source and target domains are represented by clustering models such that each cluster of a domain models a specific scenario of the studied phenomenon by defining a range of allowable values for each attribute in a given data vector. The proposed domain integration algorithm works in two steps: (i) cross-labeling and (ii) integration. Initially, each clustering model is crossly applied to label the cluster representatives of the other model. These labels are used to determine the correlations between the two models to identify the common clusters for both domains, which must be integrated within the second step. Different features of the proposed algorithm are studied and evaluated on a publicly available human activity recognition (HAR) data set and real-world data from a smart logistics use case provided by an industrial partner. The experiment's goal on the HAR data set is to showcase the algorithm's potential in automatic data labeling. While the conducted experiments on the smart logistics use case evaluate and compare the performance of the integrated and two adapted models in different domains.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127338379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Graph Neural Network with Learnable Permutation Pooling","authors":"Yu Jin, J. JáJá","doi":"10.1109/ICDMW58026.2022.00094","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00094","url":null,"abstract":"Graph neural networks (GNN) have achieved great success in various graph-related applications. Most existing graph neural network models follow the message-passing neural network (MPNN) paradigm where the graph pooling function forms a critical component that directly determines the model effectiveness. In this paper, we propose PermPool, a new graph pooling function that provably improves the GNN model expressiveness. The method is based on the insight that the distribution of node permuations, when defined properly, forms characteristic encoding of graphs. We propose to express graph representations as the expectation of node permutations with a general pooling function. We show that the graph representation remains invariant to node-reordering and has strong expressive power than MPNN models. In addition, we propose novel permutation modeling and sampling techniques that integrate PermPool into the differentiable neural network models. Empirical results show that our method outperformed other pooling methods in benchmark graph classification tasks.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115593478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sai Charan Emmadi, Satya Samudrala, Parag Agrawal, M. Natu
{"title":"Simplifying Process Navigations - Divide and Rule way","authors":"Sai Charan Emmadi, Satya Samudrala, Parag Agrawal, M. Natu","doi":"10.1109/ICDMW58026.2022.00020","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00020","url":null,"abstract":"Enterprises heavily rely on their batch processes to ensure smooth business operations. These processes contain thousands of jobs and millions of inter-dependencies. This makes it very difficult to track failures and delays, assess impact, and take timely corrective actions. Hence, it becomes very important to create logically independent groups of processes, so that it is easy to navigate, visualize, and analyze large complex processes, and highlight the areas that need attention. We present a greedy approach to find the logical groups that best meet the objective function and constraints related to batch systems. The proposed approach is implemented and used by various customers. We have validated the proposed approach on real-world customer.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116442949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cascaded Multi-Class Network Intrusion Detection With Decision Tree and Self-attentive Model","authors":"Yuchen Lan, Tram Truong-Huu, Ji-Yan Wu, S. Teo","doi":"10.1109/ICDMW58026.2022.00081","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00081","url":null,"abstract":"Network intrusion has become a leading threat to breaching the security of Internet applications. With the reemergence of artificial intelligence, deep neural networks (DNN) have been widely used for network intrusion detection. However, one main problem with the DNN models is the dependency on sufficient high-quality labeled data to train the model to achieve decent accuracy. DNN models may incur many false predictions on the imbalanced intrusion datasets, especially on the minority classes. While we continue advocating for using machine learning and deep learning for network intrusion detection, we aim at addressing the drawback of existing DNN models by effectively integrating decision tree and feature tokenizer (FT)-transformer. First, the decision tree algorithm is used for the binary classification of regular (normal) traffic and malicious traffic. Second, FT-transformer performs the multi-category classification on that malicious traffic to identify the type of attacking traffic. We conduct the performance evaluation using three publicly available datasets: CIC-IDS 2017, UNSW-NB15, and Kitsune datasets. Experimental results show that among three datasets, the proposed technique achieves the best performance on the CIC-IDS 2017 dataset with the macro precision, recall, and F1-score of 84.6%, 83.6%, and 93.2%, respectively.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129042131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Solving Non-linear Optimization Problem in Engineering by Model-Informed Generative Adversarial Network (MI-GAN)","authors":"Yuxuan Li, Chaoyue Zhao, Chenang Liu","doi":"10.1109/ICDMW58026.2022.00035","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00035","url":null,"abstract":"Optimization models have been widely used in many engineering systems to solve the problems related to system operation and management. For instance, in power systems, the optimal power flow (OPF) problem, which is a critical component of power system operations, can be formulated using optimization models. Specifically, the alternating current OPF (AC-OPF) problems are challenging since some of the constraints are non-linear and non-convex. Moreover, due to the high variability that the power system may have, the coefficients of the optimization model may change, increasing the difficulty of solving the OPF problem. Although the conventional optimization tools and deep learning approaches have been investigated, the feasibility and optimality of the solutions may still be unsatisfactory. Hence, in this paper, based on the recently developed model-informed generative adversarial network (MI-GAN) framework, a tailored version for solving the non-linear AC-OPF problem under uncertainties is proposed. The contributions of this work can be summarized into two main aspects: (1) To ensure the feasibility and improve the optimality of the generated solutions, two important layers, namely, the feasibility filter layer and optimality-filter layer, are considered and designed; and (2) An efficient model-informed selector is designed and integrated to the GAN architecture, by incorporating these two new layers to inform the generator. Experiments on the IEEE test systems demonstrate the efficacy and potential of the proposed method for solving non-linear AC-OPF problems.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126793797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Next POI Recommender System: Multi-view Representation Learning for Outstanding Performance in Various Context","authors":"Yeonghwan Jeon, Junhyung Kim","doi":"10.1109/ICDMW58026.2022.00150","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00150","url":null,"abstract":"Location-based Social Networks (LBSNs) are software service that enable a user to find knowledge and to socialize with other users by offering other user's contents (e.g. reviews, photos, etc.) to a user. This LBSNs have many sub-fields, but Point-of-Interest (POI) recommendation is the most important. Because it is related to the growth of Small and Medium Enterprise (SME) by increasing visitation rate. Generally, it should be possible to respond to various contexts of users in POI recommendation. These contexts are very various and complex, but we define mainly three contexts based on user behavior in local domain. However, each context is defined by different user behavior, so each model and performance are different on various evaluation criteria. In other words, no model is outstanding in all contexts. Therefore, this paper introduces how to define each context, how to make POI embedding for recommendation in empirical multi-view representation learning technique, and how to make optimized POI embedding which is outstanding performance in all contexts of POI recommendation, for various downstream tasks.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116321863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Noudéhouénou Lionel Jaderne Houssou, Jean-Loup Guillaume, A. Prigent
{"title":"Edit distance with Quasi Real Penalties: a hybrid distance for network-constrained trajectories","authors":"Noudéhouénou Lionel Jaderne Houssou, Jean-Loup Guillaume, A. Prigent","doi":"10.1109/ICDMW58026.2022.00136","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00136","url":null,"abstract":"In this paper, we propose a new distance for network-constrained trajectories named Edit distance with Quasi Real Penalties (EQRP). Depending on the case, it can compare trajectories as non-ordered sets and as sequences while other distances only compare trajectories as non-ordered sets or as sequences. Moreover, it is parameter-free, manages local time shifting, and respects triangle inequality; three properties expected from a trajectory distance that are not satisfied simultaneously by any other distance to the best of our knowledge. To demonstrate the pertinence of our idea, we benchmark our distance against some state-of-the-art distances for network-constrained trajectories. Specifically, for each distance, we determine its capability to identify precisely similar trajectories. We also determine their respective performance for trajectory clustering. Our results show the predominance of EQRP over the existing edit distances and in some cases a more precise ability to evaluate the dissimilarity between network-constrained trajectories compared to other measures.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126458438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Laura Melgar-García, David Gutiérrez-Avilés, Cristina Rubio-Escudero, A. T. Lora
{"title":"Nearest neighbors with incremental learning for real-time forecasting of electricity demand","authors":"Laura Melgar-García, David Gutiérrez-Avilés, Cristina Rubio-Escudero, A. T. Lora","doi":"10.1109/ICDMW58026.2022.00112","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00112","url":null,"abstract":"Electricity demand forecasting is very useful for the different actors involved in the energy sector to plan the supply chain (generation, storage and distribution of energy). Nowadays energy demand data are streaming data coming from smart meters and has to be processed in real-time for more efficient demand management. In addition, this kind of data can present changes over time such as new patterns, new trends, etc. Therefore, real-time forecasting algorithms have to adapt and adjust to online arriving data in order to provide timely and accurate responses. This work presents a new algorithm for electricity demand forecasting in real-time. The proposed algorithm generates a prediction model based on the K-nearest neighbors algorithm, which is incrementally updated as online data arrives. Both time-frequency and error threshold based model updates have been evaluated. Results using Spanish electricity demand data with a ten-minute sampling frequency rate are reported, reaching 2% error with the best prediction model obtained when the update is daily.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128202799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Abnormal Entity-Aware Knowledge Graph Completion","authors":"Keyi Sun, Shuo Yu, Ciyuan Peng, Xiang Li, Mehdi Naseriparsa, Feng Xia","doi":"10.1109/ICDMW58026.2022.00118","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00118","url":null,"abstract":"In real-world scenarios, knowledge graphs remain incomplete and contain abnormal information, such as redundan-cies, contradictions, inconsistencies, misspellings, and abnormal values. These shortcomings in the knowledge graphs potentially affect service quality in many applications. Although many approaches are proposed to perform knowledge graph completion, they are incapable of handling the abnormal information of knowledge graphs. Therefore, to address the abnormal information issue for the knowledge graph completion task, we design a novel knowledge graph completion framework called ABET, which specially focuses on abnormal entities. ABET consists of two components: a) abnormal entity prediction and b) knowledge graph completion. Firstly, the prediction component automati-cally predicts the abnormal entities in knowledge graphs. Then, the completion component effectively captures the heterogeneous structural information and the high-order features of neighbours based on different relations. Experiments demonstrate that ABET is an effective knowledge graph completion framework, which has made significant improvements over baselines. We further verify that ABET is robust for knowledge graph completion task with abnormal entities.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131047946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. E. Aktas, Sidra Jawaid, Ihsan Gokalp, Esra Akbas
{"title":"Influence Maximization on Hypergraphs via Similarity-based Diffusion","authors":"M. E. Aktas, Sidra Jawaid, Ihsan Gokalp, Esra Akbas","doi":"10.1109/ICDMW58026.2022.00158","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00158","url":null,"abstract":"Influence maximization is an important problem in network science that aims to detect critical structures, such as nodes and interactions, with a higher influence on diffusion. It has applications in information spreading, rumor controlling, marketing, disease spreading, advertising, and more. Although the influence maximization problem in graphs has been studied ex-tensively, there are a few studies that explore critical structures in hypergraphs and these studies mostly focus on detecting influential nodes rather than higher-order interactions, i.e., hyperedges. In this paper, we study the influential hyperedge detection problem. We first design diffusion models on hypergraphs based on the similarity between hyperedges. Our claim here is that similarity between hyperedges is positively correlated with the diffusion process. To study this claim, we first calculate similarity scores between hyperedges and construct similarity-based hypergraph Laplacians. Next, we extend standard graph centrality measures for hyperedges using these Laplacians. We compare the similarity- based hypergraph Laplacians with the state-of-the-art influential hyperedge detection method using two evaluation metrics: the size of the giant component and the Susceptible-Infected-Recovered (SIR) simulation model. Our experimental results suggest that overall, similarity-based Laplacians are more effective than the state-of-the-art method in finding influential higher-order hyperedges.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133357932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}