K. Yawata, Yoshihiro Osakabe, Takuya Okuyama, A. Asahara
{"title":"QUBO Decision Tree: Annealing Machine Extends Decision Tree Splitting","authors":"K. Yawata, Yoshihiro Osakabe, Takuya Okuyama, A. Asahara","doi":"10.1109/ICKG55886.2022.00052","DOIUrl":"https://doi.org/10.1109/ICKG55886.2022.00052","url":null,"abstract":"This paper proposes an extension of regression trees by quadratic unconstrained binary optimization (QUBO). Regression trees are very popular prediction models that are trainable with tabular datasets, but their accuracy is insufficient because the decision rules are too simple. The proposed method extends the decision rules in decision trees to multi-dimensional boundaries. Such an extension is generally unimplementable because of computational limitations, however, the proposed method transforms the training process to QUBO, which enables an annealing machine to solve this problem.","PeriodicalId":278067,"journal":{"name":"2022 IEEE International Conference on Knowledge Graph (ICKG)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125302641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Audrey Der, Chin-Chia Michael Yeh, R. Wu, Junpeng Wang, Yan Zheng, Zhongfang Zhuang, Liang Wang, Wei Zhang, Eamonn J. Keogh
{"title":"Matrix Profile XXVII: A Novel Distance Measure for Comparing Long Time Series","authors":"Audrey Der, Chin-Chia Michael Yeh, R. Wu, Junpeng Wang, Yan Zheng, Zhongfang Zhuang, Liang Wang, Wei Zhang, Eamonn J. Keogh","doi":"10.1109/ICKG55886.2022.00013","DOIUrl":"https://doi.org/10.1109/ICKG55886.2022.00013","url":null,"abstract":"The most useful data mining primitives are distance measures. With an effective distance measure, it is possible to perform classification, clustering, anomaly detection, segmentation, etc. For single-event time series Euclidean Distance and Dynamic Time Warping distance are known to be extremely effective. However, for time series containing cyclical behaviors, the semantic meaningfulness of such comparisons is less clear. For example, on two separate days the telemetry from an athlete's workout routine might be very similar. However, on the second day she might have changed the order in which she did push-ups and squats, added a few repetitions of pull-ups, or completely omitted dumbbell curls. Any one of these minor changes would defeat existing time series distance measures. Some “bag-of-features” methods have been proposed to address this problem; however, we argue that in many cases, similarity is intimately tied to the shapes of subsequences within these longer time series. In such cases, summative features will lack discrimination ability. In this work we introduce PRCIS, which stands for Pattern Representation Comparison in Series. PRCIS is a distance measure for long time series, which exploits recent progress in our ability to summarize time series with “dictionaries”. We will demonstrate the utility of our ideas on diverse tasks and datasets.","PeriodicalId":278067,"journal":{"name":"2022 IEEE International Conference on Knowledge Graph (ICKG)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129828630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giorgi Kvernadze, P. A. Sudyanti, N. Subedi, M. Hajiaghayi
{"title":"Two is Better Than One: Dual Embeddings for Complementary Product Recommendations","authors":"Giorgi Kvernadze, P. A. Sudyanti, N. Subedi, M. Hajiaghayi","doi":"10.1109/ICKG55886.2022.00024","DOIUrl":"https://doi.org/10.1109/ICKG55886.2022.00024","url":null,"abstract":"Embedding based product recommendations have gained popularity in recent years due to its ability to easily integrate to large-scale systems and allowing nearest neighbor searches in real-time. The bulk of studies in this area has predominantly been focused on similar item recommendations. Research on complementary item recommendations, on the other hand, still remains considerably under-explored. We define similar items as items that are interchangeable in terms of their utility and complementary items as items that serve different purposes, yet are compatible when used with one another. In this paper, we apply a novel approach to finding complementary items by leveraging dual embedding representations for products. We demonstrate that the notion of relatedness discovered in NLP for skip-gram negative sampling (SGNS) models translates effectively to the concept of complementarity when training item representations using co-purchase data. Since sparsity of purchase data is a major challenge in real-world scenarios, we further augment the model using synthetic samples to extend coverage. This allows the model to provide complementary recommendations for items that do not share co-purchase data by leveraging other abundantly available data modalities such as images, text, clicks etc. We establish the effectiveness of our approach in improving both coverage and quality of recommendations on real world data for a major online retail company. We further show the importance of task specific hyperparameter tuning in training SGNS. Our model is effective yet simple to implement, making it a great candidate for generating complementary item recommendations at any e-commerce website.","PeriodicalId":278067,"journal":{"name":"2022 IEEE International Conference on Knowledge Graph (ICKG)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124580076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Graph Variational Autoencoder for Short Text Topic Modeling with Mutual Information Maximization","authors":"Yuhang Ge, Xuegang Hu","doi":"10.1109/ICKG55886.2022.00016","DOIUrl":"https://doi.org/10.1109/ICKG55886.2022.00016","url":null,"abstract":"Neural topic models can successfully capture thematic patterns of the document with black-box variational inference, but they still suffer from sparsity problem when facing short texts with limited contextual information. To alleviate the sparsity problem, some graph-based methods have been proposed to explicitly model the word co-occurrence patterns. However, they ignore sequential information and word relevance degree in the document, resulting in inaccurate topic representations. Therefore, we propose a novel graph-based neural topic model, namely mutual Information enhanced Graph Topic Model (InfoGTM), which leverages the sequential information and takes into account the word relevance degree into topic modeling using a more accurate semantic graph. More specifically, instead of pre-computing statistical word co-occurrence, we develop an automatic way to dynamically construct semantic graph with a multi-head attention mechanism, which integrates both contextual and words structure information into the semantic graph, thereby providing more accurate word co-occurrence information. After that, a graph variational auto-encoder topic modeling framework is adopted to generate topic proportions for each short text. To further enhance the topic representation, we maximize the mutual information between input words and topic representations to ensure more semantic information could be compressed. Besides, mutual information maximization could preserve the smooth manifold structure of short texts, which enables the spread the similar topic representation from neighboring short texts. Substantial experiments are conducted on several benchmark data sets that verify the superiority of our method compared to the state-of-the-arts regard to the topic coherence performance.","PeriodicalId":278067,"journal":{"name":"2022 IEEE International Conference on Knowledge Graph (ICKG)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115850186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Storage Scheme for Sustainable Development Goals Data Over Distributed Knowledge Graph Stores","authors":"Irene Kilanioti, G. A. Papadopoulos","doi":"10.1109/ICKG55886.2022.00023","DOIUrl":"https://doi.org/10.1109/ICKG55886.2022.00023","url":null,"abstract":"The achievement of the Sustainable Development Goals (SDGs) is important in order to ensure a world worth living in for future generations. Digitization and the plethora of data available for analysis offer new opportunities to support and monitor the achievement of the SDGs. Scholars can contribute to the achievement of the SDGs by guiding the actions of practitioners based on the analysis of data, as intended by this work. In this paper, we propose dimensionality reduction methods to semantically cluster new uncategorised SDG data and novel indicators, and efficiently place them in the environment of a distributed knowledge graph store. In particular, our work proposes and experimentally corroborates the use of Hilbert Space Filling Curves (HSFCs) to efficiently store real SDG data with reduced retrieval times and preservation of their semantic closeness. First, algorithm is theoretically founded and explained and an approach for data classification of entrant-indicators is described. Then, a thorough case study in a distributed knowledge graph environment experimentally evaluates our algorithm. The results are presented and discussed in light of theory along with the actual impact that can have for practitioners analysing SDG data, including intergovernmental organizations, government agencies and social welfare organizations. Our approach empowers SDG knowledge graphs for causal analysis, inference, and manifold interpretations of the societal implications of SDG-related actions, as data are accessed in reduced retrieval times. It facilitates quicker measurement of influence of users and communities on specific goals and serves for faster distributed knowledge matching, as semantic cohesion of data is preserved.","PeriodicalId":278067,"journal":{"name":"2022 IEEE International Conference on Knowledge Graph (ICKG)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134095182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Kernel-FM: A Multi-Embedding & Kernelization Factorization Machine Framework for CTR Prediction","authors":"Yijun Wang, Kaibo Xu","doi":"10.1109/ICKG55886.2022.00043","DOIUrl":"https://doi.org/10.1109/ICKG55886.2022.00043","url":null,"abstract":"Click-Through Rate (CTR) Prediction is one of the most critical components in recommender systems, where the task is to estimate the probability that a user clicks an item. In CTR models, embedding methods are widely used in feature representation to map categorical features into lower dimensional vectors, and thus those representations can be further exploited by various machine learning algorithms such as Factorization Machines (FMs) for CTR prediction. However, in the literature, most existing embedding models can only extract one latent vector for each individual feature as they calculate the feature interaction based on simple element product or inner product, limiting its ability to model user-item interactions in a high-dimensional space. It may miss some deep and complex interacted latent features, and therefore lead to a less proper representation as well as an inaccurate prediction. Motivated by the status quo, in this paper, we therefore propose a novel Multi-Kernel-FM (MKFM) framework for the task of CTR prediction. First of all, an embedding-based approach called Multi-FM (MFM) is proposed. It uses multiple embedding strategy and considers multiple representation sub-spaces for representing user-item features. After that, we construct a MKFM framework which combines kernel function and MFM to capture non-linear feature interactions. Then, the concept of kernel function is introduced and employed for capturing more high-dimensional feature interactions to further improve prediction accuracy. The results of our experiments on four public datasets demonstrate the superiorities of the proposed framework to some existing methods with respect to both prediction accuracy and training cost.","PeriodicalId":278067,"journal":{"name":"2022 IEEE International Conference on Knowledge Graph (ICKG)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126627486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HTransE: Hybrid Translation-based Embedding for Knowledge Graphs","authors":"A. Shah, Bonaventure Molokwu, Ziad Kobti","doi":"10.1109/ICKG55886.2022.00037","DOIUrl":"https://doi.org/10.1109/ICKG55886.2022.00037","url":null,"abstract":"Basically, a Knowledge Graph (KG) is a graph variant that represents data via triplets comprising a head, a tail, and a relation. Realistically, most KGs are compiled either manually or semi-automatically, and this usually results in a significant loss of vital information with respect to the KG. Thus, this problem of incompleteness is common to virtually all KGs; and it is formally defined as Knowledge Graph Completion (KGC) problem. In this paper, we have explored learning the representations of a KGs with regard to its entities and relations for the purpose of any predicting missing link(s). In that regard, this paper proposes a hybrid variant, composed of TransE and SimplE models, for solving KGC problems. On one hand, the TransE model depicts a relation as the translation from the source entity (head) to the target entity (tail) within an embedding space. In TransE, the head and tail entities are derived from the same embedding-generation class, which results in a low prediction score. Also, the TransE model is not able to capture symmetric relationships as well as one-to-many relationships. On the other hand, the SimplE model is based on Canonical Polyadic (CP) decomposition. SimplE enhances CP via the addition of the inverse relation, while the head entity and tail entity are derived from different embedding-generation classes which are interdependent. Hence, we employed the principle of inverse-relation embedding (from the SimplE model) onto the native TransE model so as to yield a new hybrid resultant: HTransE. Therefore, HTransE boasts of efficiency as well as improved prediction scores. Efficiently, HTransE converges much quicker in comparison to TransE. In other words, HTransE converges at approximately $n/2$ iterations where $n$ denotes the iterations required to fully train TransE. Our results outperform the native TransE approach with a significant difference. Also, HTransE outperforms several state-of-the-art models on different datasets.","PeriodicalId":278067,"journal":{"name":"2022 IEEE International Conference on Knowledge Graph (ICKG)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132517237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Progressive Feature Upgrade in Semi-supervised Learning on Tabular Domain","authors":"Morteza Mohammady Gharasuie, Fenjiao Wang","doi":"10.1109/ICKG55886.2022.00031","DOIUrl":"https://doi.org/10.1109/ICKG55886.2022.00031","url":null,"abstract":"Recent semi-supervised and self-supervised methods have shown great success in the image and text domains by utilizing augmentation techniques. Despite such success, it is not easy to transfer this success to a tabular domain. The common transformations from image and language are not easily adaptable to tabular data containing different data types (continuous and categorical data). There are a few semi-supervised works on the tabular domain that have focused on proposing new augmentation techniques for tabular data. These approaches may have shown some improvement in datasets with low-cardinality in categorical data. However, the fundamental challenges have not been tackled. The proposed methods either do not apply to datasets with high-cardinality or do not use an efficient encoding of categorical data. We propose using conditional probability representation and an efficient progressively feature upgrading framework to effectively learn representations for tabular data in semi-supervised applications. The extensive experiments show the superior performance of the proposed framework and the potential application in semi-supervised settings.","PeriodicalId":278067,"journal":{"name":"2022 IEEE International Conference on Knowledge Graph (ICKG)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131647211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fausto Giunchiglia, V. Maltese, A. Ganbold, Alessio Zamboni
{"title":"An Architecture and a Methodology Enabling Interoperability within and across Universities","authors":"Fausto Giunchiglia, V. Maltese, A. Ganbold, Alessio Zamboni","doi":"10.1109/ICKG55886.2022.00017","DOIUrl":"https://doi.org/10.1109/ICKG55886.2022.00017","url":null,"abstract":"We propose a general methodology and an infrastructure which allows to achieve interoperability within the same university and across universities. The former goal is achieved by incrementally defining and building a knowledge graph (KG) using data coming from multiple heterogeneous databases. Interoperability across universities is achieved by having a reference KG schema that each university can adapt to the local needs, but keeping track of the changes, and by natively supporting multilinguality. We achieve this latter requirement by exploiting a multilingual lexical resource containing more than one thousand languages and by seamlessly translating across the schemas and also (to some extent) across the data written in the local languages. The effectiveness of the proposed approach is proven by the services developed in the context of two different projects conducted in two universities in Italy and Mongolia.","PeriodicalId":278067,"journal":{"name":"2022 IEEE International Conference on Knowledge Graph (ICKG)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116911186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}