{"title":"Multi-Kernel-FM: A Multi-Embedding & Kernelization Factorization Machine Framework for CTR Prediction","authors":"Yijun Wang, Kaibo Xu","doi":"10.1109/ICKG55886.2022.00043","DOIUrl":"https://doi.org/10.1109/ICKG55886.2022.00043","url":null,"abstract":"Click-Through Rate (CTR) Prediction is one of the most critical components in recommender systems, where the task is to estimate the probability that a user clicks an item. In CTR models, embedding methods are widely used in feature representation to map categorical features into lower dimensional vectors, and thus those representations can be further exploited by various machine learning algorithms such as Factorization Machines (FMs) for CTR prediction. However, in the literature, most existing embedding models can only extract one latent vector for each individual feature as they calculate the feature interaction based on simple element product or inner product, limiting its ability to model user-item interactions in a high-dimensional space. It may miss some deep and complex interacted latent features, and therefore lead to a less proper representation as well as an inaccurate prediction. Motivated by the status quo, in this paper, we therefore propose a novel Multi-Kernel-FM (MKFM) framework for the task of CTR prediction. First of all, an embedding-based approach called Multi-FM (MFM) is proposed. It uses multiple embedding strategy and considers multiple representation sub-spaces for representing user-item features. After that, we construct a MKFM framework which combines kernel function and MFM to capture non-linear feature interactions. Then, the concept of kernel function is introduced and employed for capturing more high-dimensional feature interactions to further improve prediction accuracy. The results of our experiments on four public datasets demonstrate the superiorities of the proposed framework to some existing methods with respect to both prediction accuracy and training cost.","PeriodicalId":278067,"journal":{"name":"2022 IEEE International Conference on Knowledge Graph (ICKG)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126627486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HTransE: Hybrid Translation-based Embedding for Knowledge Graphs","authors":"A. Shah, Bonaventure Molokwu, Ziad Kobti","doi":"10.1109/ICKG55886.2022.00037","DOIUrl":"https://doi.org/10.1109/ICKG55886.2022.00037","url":null,"abstract":"Basically, a Knowledge Graph (KG) is a graph variant that represents data via triplets comprising a head, a tail, and a relation. Realistically, most KGs are compiled either manually or semi-automatically, and this usually results in a significant loss of vital information with respect to the KG. Thus, this problem of incompleteness is common to virtually all KGs; and it is formally defined as Knowledge Graph Completion (KGC) problem. In this paper, we have explored learning the representations of a KGs with regard to its entities and relations for the purpose of any predicting missing link(s). In that regard, this paper proposes a hybrid variant, composed of TransE and SimplE models, for solving KGC problems. On one hand, the TransE model depicts a relation as the translation from the source entity (head) to the target entity (tail) within an embedding space. In TransE, the head and tail entities are derived from the same embedding-generation class, which results in a low prediction score. Also, the TransE model is not able to capture symmetric relationships as well as one-to-many relationships. On the other hand, the SimplE model is based on Canonical Polyadic (CP) decomposition. SimplE enhances CP via the addition of the inverse relation, while the head entity and tail entity are derived from different embedding-generation classes which are interdependent. Hence, we employed the principle of inverse-relation embedding (from the SimplE model) onto the native TransE model so as to yield a new hybrid resultant: HTransE. Therefore, HTransE boasts of efficiency as well as improved prediction scores. Efficiently, HTransE converges much quicker in comparison to TransE. In other words, HTransE converges at approximately $n/2$ iterations where $n$ denotes the iterations required to fully train TransE. Our results outperform the native TransE approach with a significant difference. Also, HTransE outperforms several state-of-the-art models on different datasets.","PeriodicalId":278067,"journal":{"name":"2022 IEEE International Conference on Knowledge Graph (ICKG)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132517237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Progressive Feature Upgrade in Semi-supervised Learning on Tabular Domain","authors":"Morteza Mohammady Gharasuie, Fenjiao Wang","doi":"10.1109/ICKG55886.2022.00031","DOIUrl":"https://doi.org/10.1109/ICKG55886.2022.00031","url":null,"abstract":"Recent semi-supervised and self-supervised methods have shown great success in the image and text domains by utilizing augmentation techniques. Despite such success, it is not easy to transfer this success to a tabular domain. The common transformations from image and language are not easily adaptable to tabular data containing different data types (continuous and categorical data). There are a few semi-supervised works on the tabular domain that have focused on proposing new augmentation techniques for tabular data. These approaches may have shown some improvement in datasets with low-cardinality in categorical data. However, the fundamental challenges have not been tackled. The proposed methods either do not apply to datasets with high-cardinality or do not use an efficient encoding of categorical data. We propose using conditional probability representation and an efficient progressively feature upgrading framework to effectively learn representations for tabular data in semi-supervised applications. The extensive experiments show the superior performance of the proposed framework and the potential application in semi-supervised settings.","PeriodicalId":278067,"journal":{"name":"2022 IEEE International Conference on Knowledge Graph (ICKG)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131647211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fausto Giunchiglia, V. Maltese, A. Ganbold, Alessio Zamboni
{"title":"An Architecture and a Methodology Enabling Interoperability within and across Universities","authors":"Fausto Giunchiglia, V. Maltese, A. Ganbold, Alessio Zamboni","doi":"10.1109/ICKG55886.2022.00017","DOIUrl":"https://doi.org/10.1109/ICKG55886.2022.00017","url":null,"abstract":"We propose a general methodology and an infrastructure which allows to achieve interoperability within the same university and across universities. The former goal is achieved by incrementally defining and building a knowledge graph (KG) using data coming from multiple heterogeneous databases. Interoperability across universities is achieved by having a reference KG schema that each university can adapt to the local needs, but keeping track of the changes, and by natively supporting multilinguality. We achieve this latter requirement by exploiting a multilingual lexical resource containing more than one thousand languages and by seamlessly translating across the schemas and also (to some extent) across the data written in the local languages. The effectiveness of the proposed approach is proven by the services developed in the context of two different projects conducted in two universities in Italy and Mongolia.","PeriodicalId":278067,"journal":{"name":"2022 IEEE International Conference on Knowledge Graph (ICKG)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116911186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Robust Semantic Frame Parsing Pipeline on a New Complex Twitter Dataset","authors":"Yu Wang, Hongxia Jin","doi":"10.1109/ICKG55886.2022.00044","DOIUrl":"https://doi.org/10.1109/ICKG55886.2022.00044","url":null,"abstract":"Most recent semantic frame parsing systems for spoken language understanding (SLU) are designed based on recurrent neural networks. These systems display decent performance on benchmark SLU datasets such as ATIS or SNIPS, which contain short utterances with relatively simple patterns. However, the current semantic frame parsing models lack a mechanism to handle out-of-distribution (OOD) patterns and out-of-vocabulary (OOV) tokens. In this paper, we introduce a robust semantic frame parsing pipeline that can handle both OOD patterns and OOV tokens in conjunction with a new complex Twitter dataset that contains long tweets with more OOD patterns and OOV tokens. The new pipeline demonstrates much better results in comparison to state-of-the-art baseline SLU models on both the SNIPS dataset and the new Twitter dataset11Our new Twitter dataset can be downloaded from https://1drv.ms/u/s!AroHb-W6_OAlavK4begsDsMALfE?e=c8f2XX. Finally, we also build an E2E application to demo the feasibility of our algorithm and show why it is useful in real application.","PeriodicalId":278067,"journal":{"name":"2022 IEEE International Conference on Knowledge Graph (ICKG)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126267021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Sentence Matching via Exploiting List-level Semantics Expansion","authors":"Ruijun Sun, Zhi Li, Qi Liu, Zhefeng Wang, Xinyu Duan, Baoxing Huai, N. Yuan","doi":"10.1109/ICKG55886.2022.00039","DOIUrl":"https://doi.org/10.1109/ICKG55886.2022.00039","url":null,"abstract":"Sentence semantic matching plays a fundamental role in natural language understanding and is widely used in various tasks. Previous methods are mainly based on the One-to-One single pair matching process, while in many real application scenarios, such as user query matching in FAQ, we need to match multiple sentences with the specified query sentence. The reality is that the multi-sentence matching problem is largely unexplored. Although to some extent, we can use multiple One-to-One pair matching on the multi-sentence matching problem, the exploitation of list-level candidate semantics is still limited, which leads to inferior performance. To that end, in this paper, we present a focused study on the novel multi-sentence matching problem. To begin with, we propose a Multi-Sentence Matching Network (MSMN) with a meticulously designed Fusion Module and Contrastive Matching Module to deeply exploit the list-level candidate semantics expansion for the multi-sentence matching task. Then, we construct two multi-sentence matching datasets, i.e., mQuora and mReceipt, to bridge the gap of benchmark data in this area. Finally, we conduct extensive experiments on two datasets. The experimental results clearly demonstrate the effectiveness of our method.","PeriodicalId":278067,"journal":{"name":"2022 IEEE International Conference on Knowledge Graph (ICKG)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129951542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Foreign Accent Conversion using Concentrated Attention","authors":"Xuexue Zang, Fei Xie, F. Weng","doi":"10.1109/ICKG55886.2022.00056","DOIUrl":"https://doi.org/10.1109/ICKG55886.2022.00056","url":null,"abstract":"Foreign accent conversion is an important and challenging problem due to significant differences in the manner of articulation and the speech prosody of different regional speakers. In this paper, we propose a new method for the problem of foreign accent conversion that uses Phonetic Posteriorgrams (PPGs) and Log-scale Fundamental frequency (Log-F0) to address the mismatches of phonetic and prosody. Furthermore, we propose using concentrated attention to improve the alignment of input sequences and mel-spectrograms. The concentrated attention selects the top k highest score values in the attention matrix row by row. In this way, the attention weight of the content related to the current sequence will be the largest. Our approach first trains a PPG extractor using LibriSpeech Corpus, which uses an end-to-end hybrid CTC-attention model. Then, the modified Tacotron2 based on concentrated attention is trained to model the relationships between PPGs and mel-spectrograms. In our proposed framework, the input of Tacotron2 is the concatenation of PPG embedding and normalized Log-scale fundamental frequency (Log-F0). In the convert stage, WaveGlow is modeled to generate speech, which is a streaming structure. To better verify the effectiveness of our proposed method, we also add some objective evaluation methods. These include Mel spectral distance, Object_MOS score, speaker similarity, and similarity in the embedding space of the entire speech. Experiments shows that our proposed concentrated attention method delivers comparable or better results than the previous foreign accent conversion method in terms of voice naturalness, speaker similarity to the source speaker, and accent similarity to the target speaker.","PeriodicalId":278067,"journal":{"name":"2022 IEEE International Conference on Knowledge Graph (ICKG)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134305098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Weighted Transformer for Dialect Speech Recognition","authors":"Minghan Zhang, Fei Xie, F. Weng","doi":"10.1109/ICKG55886.2022.00055","DOIUrl":"https://doi.org/10.1109/ICKG55886.2022.00055","url":null,"abstract":"End-to-end automatic speech recognition (ASR) with transformer models has recently made significant progress, surpassing Recurrent Neural Networks (RNNs) or Convolutional Neural Networks (CNNs) mainly because it can model larger context in a parameter-efficient way using self-attention and feed-forward layers. Among them, the self-attention mechanism has received great attention from scholars and it has demonstrated promising results in various natural speech recognition (ASR) tasks. However, the classical transformer-based approaches usually require a large number of parameters and training data, with many training iterations to converge well. In addition, dialect is a common phenomenon in our daily life, and one may not have a large amount of data to start with. More importantly, different dialects have obvious differences at the phonological level although they may share large similarities at the lexical level. Thus, efficient training of highly accurate ASR models for various dialects is very much desirable but remains a challenging problem. In this work, we introduce a weighted transformer that makes better use of the multi-head attention mechanism and obtains more accurate results for four spoken English dialects. We pretrain our base models including weighted transformers using the 960 hours LibriSpeech dataset and adapt them on English dialect data of Common Voice and LibriSpeech SLR83 speech datasets respectively. Experimental results further show that the added weight can distinguish different dialects to obtain better representation. And our proposed dialect-dependent ASR system is significantly more accurate than the classical transformer baseline. In addition, during the training process, we found that the training speed of the new model has been improved by 15%-30%.","PeriodicalId":278067,"journal":{"name":"2022 IEEE International Conference on Knowledge Graph (ICKG)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133747468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nannan Wu, Ning Zhang, Qingcheng Lu, Jie Zhang, Wenjun Wang, Ying Sun, Siddharth Bhatia
{"title":"Multiple Anomaly Alignments on Network Traffics","authors":"Nannan Wu, Ning Zhang, Qingcheng Lu, Jie Zhang, Wenjun Wang, Ying Sun, Siddharth Bhatia","doi":"10.1109/ICKG55886.2022.00047","DOIUrl":"https://doi.org/10.1109/ICKG55886.2022.00047","url":null,"abstract":"Anomaly subgraph detection has been widely used in various scenarios and fields (e.g., congestion related to passenger cars). Most existing methods for discovering anomalies in multiple attribute networks are embedding multiple attributes in a single network. The main challenge is that not all multiple networks can be integrated into a single network. It is challenging to optimize the same anomaly in multiple attributed networks. In this work, we propose the method for anomaly alignment across multiple attribute networks (A3MAN). Multiple anomalies are aligned with structural characteristics and anchor links between networks. Finally, we have conducted the experiments on the four real datasets, and the detected more aligned anomalies consisted of anchor links, which are 8.8 times that of the competitive methods (up to 63 times at the noise ratio 30%). For the case study on the real traffic dataset in Tianjin, China, we found that the “Yingfengdao” subway station located in the center of the “Nankai University Town” is the peak order location for bicycle-sharing and car-hailing region.","PeriodicalId":278067,"journal":{"name":"2022 IEEE International Conference on Knowledge Graph (ICKG)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132847824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clustering by peak merging","authors":"Mi-Sung Han, Jong-Seok Lee","doi":"10.1109/ICKG55886.2022.00019","DOIUrl":"https://doi.org/10.1109/ICKG55886.2022.00019","url":null,"abstract":"Clustering by fast search-and-find of density peaks (CFSFDP) is a recently developed density-based clustering method that is being widely used as it can effectively detect isolated high-density regions. However, it often fails to identify true cluster structures from data owing to its intrinsic assumption that a cluster has a unique and high-density center, because a single cluster can contain several peaks. We call this the “multi-peak problem”. To overcome this, we propose a peak merging method for clustering. In the proposed algorithm, a valley and its local density are defined to identify the intersection between two adjoined peaks. These are used to construct directed and connected subgraphs, using which we merge multiple peaks if needed. Unlike CFSFDP and its variants, the proposed method is capable of identifying highly complex shaped clusters with no interpretation of a decision graph. Numerical experiments based on synthetic and real datasets demonstrate that our method outperforms the benchmarking methods.","PeriodicalId":278067,"journal":{"name":"2022 IEEE International Conference on Knowledge Graph (ICKG)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132113665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}