Zhong Chen, Zhide Fang, Victor S. Sheng, Andrea Edwards, Kun Zhang
{"title":"CSRDA: Cost-sensitive Regularized Dual Averaging for Handling Imbalanced and High-dimensional Streaming Data","authors":"Zhong Chen, Zhide Fang, Victor S. Sheng, Andrea Edwards, Kun Zhang","doi":"10.1109/ICKG52313.2021.00031","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00031","url":null,"abstract":"Class-imbalance is one of the most challenging problems in online learning due to its impact on the prediction capability of data stream mining models. Most existing approaches for online learning lack an effective mechanism to handle high-dimensional streaming data with skewed class distributions, resulting in insufficient model interpretation and deterioration of online performance. In this paper, we develop a cost-sensitive regularized dual averaging (CSRDA) method to tackle this problem. Our proposed method substantially extends the influential regularized dual averaging (RDA) method by formulating a new convex optimization function. Specifically, two $R$ 1 -norm regularized cost-sensitive objective functions are directly optimized, respectively. We then theoretically analyze CSRDA's regret bounds and the bounds of primal variables. Thus, CSRDA benefits from achieving a theoretical convergence of balanced cost and sparsity for severe imbalanced and high-dimensional streaming data mining. To validate our method, we conduct extensive experiments on six benchmark streaming datasets with varied imbalance ratios. The experimental results demonstrate that, compared to other baseline methods, CSRDA not only improves classification performance, but also successfully captures sparse features more effectively, hence has better interpretability.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"06 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130579015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Lu, Chaoqun Fei, Chuanqing Wang, Yu Huang, Songmao Zhang
{"title":"YABKO-Yet Another Big Knowledge Organization","authors":"R. Lu, Chaoqun Fei, Chuanqing Wang, Yu Huang, Songmao Zhang","doi":"10.1109/ICKG52313.2021.00041","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00041","url":null,"abstract":"Knowledge graph and its processing techniques have got wide spread attention from the AI and knowledge engineering society. However, the knowledge graph supporting platforms have gained much less concern. This paper emphasizes the role of knowledge graph platforms as an independent product of knowledge engineering. Starting from the introduction of HAPE - a programmable universal big knowledge graph platform, which is a predecessor of YABKO, we introduce the idea and technique of Web-based resource sharing public knowledge graph laboratory and its implementation YABKO, which has a threefold target. Firstly, it is an open source platform for researchers doing experimental research on knowledge graphs supported by YABKO's own resources. Secondly, it supports research on big knowledge engineering, in particular in the knowledge graph area. Thirdly, it supports a full life cycle research on big knowledge graphs. Further we introduce YABKOS, a constellation of YABKOs on the Web, which is a decentralized research lab for large scale knowledge graph experiments. Also the wide area programming language Knorc for knowledge graphs' operation orchestration is introduced.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123208903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Gradient-based DAG Learning by Structural Asymmetry","authors":"Yujie Wang, Shuai Yang, Xianjie Guo, Kui Yu","doi":"10.1109/ICKG52313.2021.00022","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00022","url":null,"abstract":"Directed acyclic graph (DAG) learning plays a fun-damental role in causal inference and other scientific scenes, which aims to uncover the relationships between variables. However, identifying a DAG from observational data has al-ways been a challenging task. Recently, gradient-based DAG learning algorithms that convert a combination-optimization DAG learning problem into a continuous-optimization problem have achieved emerging successes. These algorithms are easy to optimize and able to deal with both parametric and non-parametric data but suffer from many reversed edges learnt by these algorithms. In this paper, we propose a framework named Residual Independence Test (RIT) to correct those reversed edges by leveraging the structural asymmetry reflected in the depen-dence between regression residual and direct cause. We conduct extensive experiments on both synthetic and benchmark datasets, the results show that the RIT framework significantly improve the performance of gradient-based DAG learning algorithms.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123238267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Evan Shieh, Saul Simhon, Geetha G. Aluri, Giorgos Papachristoudis, Doa Yakut, Dhanya Raghu
{"title":"Attribute Similarity and Relevance-Based Product Schema Matching for Targeted Catalog Enrichment","authors":"Evan Shieh, Saul Simhon, Geetha G. Aluri, Giorgos Papachristoudis, Doa Yakut, Dhanya Raghu","doi":"10.1109/ICKG52313.2021.00043","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00043","url":null,"abstract":"Many eCommerce catalogs rely on structured prod-uct data to provide a good experience for customers. For large scale services, product information is provided by millions of different manufacturer and vendor schemas. Due to inherent heterogeneity of this data, unifying it to a consistent catalog schema remains a challenge. Schema matching is the problem of finding such correspondences between concepts in different distributed, heterogeneous data sources. Most approaches in automated schema matching assume either a small number of source schemas, attributes, and contexts (i.e., matching movie attributes from media knowledge bases). By contrast, schema matching in product catalogs encounter the problem of scaling across millions of noisy, heterogenous schemas spanning thou-sands of categories and attributes. In this paper, we introduce a scalable schema matching framework that utilizes unsupervised domain-specific attribute representations and general attribute similarity metrics. Our method first identifies relevant attributes for a given product based on existing customer signals, and then prioritizes among candidate attributes to consolidate only those relevant product facts from multiple manufacturers and vendors with little to no labeled data. We demonstrate value by experiments that enriched catalog data containing millions of attribute enumer-ations sourced from tens of thousands of schemas across a wide range of product categories. Experimental results show reduced manual annotation efforts by 75% from competing schema matching efforts by automating schema matching on targeted product facts, resulting in high accuracy, precision, and recall for important attributes that contribute to customer interest. We also demonstrate performance improvements of 8% MRR using our approach compared against two well-established approaches to unsupervised schema matching.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116356656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Changyang Tai, Ze Yang, Huicheng Zhang, Gongqing Wu, Junwei Lv, Xianyu Bao
{"title":"Gaussian Model-Based Fully Convolutional Networks for Multivariate Time Series Classification","authors":"Changyang Tai, Ze Yang, Huicheng Zhang, Gongqing Wu, Junwei Lv, Xianyu Bao","doi":"10.1109/ICKG52313.2021.00028","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00028","url":null,"abstract":"Multivariate time series (MTS) classification has been regarded as one of the most challenging problems in data mining due to the difficulty in modeling the correlation of variables and samples. In addition, high-dimensional MTS modeling has a large time and space consumption. This paper proposes a novel method, Gaussian Model-based Fully Convolutional Networks (GM-FCN), to improve the performance of high-dimensional MTS classification. Each original MTS is converted into multivariate Gaussian model parameters as the input of FCN. These parameters effectively capture the correlation be-tween MTS variables and significantly reduce the data scale by aligning an MTS size to its dimension. FCN is designed to learn more in-depth features of MTS based on these parameters for modeling the correlation between samples. Thus, GM-FCN can not only model the correlation between variables, but also the correlation between samples. We compare GM-FCN with nine state-of-the-art MTS classification methods, INN-ED, INN-DTW-i, INN-DTW-D, KLD-GMC, MLP, ResNet, Encoder, MCNN, and MCDCNN, on four high-dimensional public datasets, experimen-tal results show that the accuracy of G M - FCN is significantly superior to the others. Besides, the training time of GM-FCN is dozens of times faster than FCN using the original equal-length MTS data as input.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121635404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient framework for sentence similarity inspired by quantum computing","authors":"Yan Yu, Dong Qiu, Ruiteng Yan","doi":"10.1109/ICKG52313.2021.00030","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00030","url":null,"abstract":"Accurately extracting the semantic information and the syntactic structure of sentences is important in natural language processing. Existing methods mainly combine the dependency tree to deep learning with complex computation time to achieve enough semantic information. It is essential to obtain sufficient semantic information and syntactic structures without any prior knowledge excepting word2vec. This paper proposes a model on sentence representation inspired by quantum entanglement using the tensor product to entangle both two consecutive notional words and words with depen-dencies. Inspired by quantum entanglement coefficients, we construct two different entanglement coefficients to weight the different semantic contributions of words with different relations. Finally, the proposed model is applied to SICK_train to verify their performances. The experimental results show that the provided methods achieve perfect results.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114748706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Round Parsing-based Multiword Rules for Scientific Knowledge Extraction","authors":"Joseph Kuebler, Lingbo Tong, Meng Jiang","doi":"10.1109/ICKG52313.2021.00051","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00051","url":null,"abstract":"Information extraction (IE) in scientific literature has facilitated many down-stream knowledge-driven tasks. Ope-nIE, which does not require any relation schema but identifies a relational phrase to describe the relationship between a subject and an object, is being a trending topic of IE in sciences. The subjects, objects, and relations are often multiword expressions, which brings challenges for methods to identify the boundaries of the expressions given very limited or even no training data. In this work, we present a set of rules for extracting structured information based on dependency parsing that can be applied to any scientific dataset requiring no expert's annotation. Results on novel datasets show the effectiveness of the proposed method. We discuss negative results as well.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134445594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenyu Cao, C. Yan, Fangtao Li, Zihe Liu, Z. Wang, Bin Wu
{"title":"Recognizing Characters and Relationships from Videos via Spatial-Temporal and Multimodal Cues","authors":"Chenyu Cao, C. Yan, Fangtao Li, Zihe Liu, Z. Wang, Bin Wu","doi":"10.1109/ICKG52313.2021.00032","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00032","url":null,"abstract":"Video contains rich semantic knowledge of multiple modalities related to a person. Mining deep or potential semantic knowledge in the video could help artificial intelligence better understand the behavior and emotion of humans in the video. The researches for deep and context semantic knowledge in the video are few at present. Many researches on the knowledge mining of characters and visual relationships between humans still remain on static picture, lacking attention to the temporal visual features and other important modalities. In order to better mine the semantic knowledge in the video, we propose the novel Global-local VLAD (GL-VLAD) module, using the convolution of different scales to enlarge different receptive fields and extract the global and local information of features in the video. In addition, we propose a Multimodal Fusion Graph(MFG) to focus on the knowledge of different modalities, which can represent the general features in multi-modal video scenes. We use this method to conduct a large number of experiments of social relation extraction and person recognition on the dataset MovieGraphs and IQIYI- VID-2019. The accuracy and mAP respectively reach 90.23% and 89.87% on IQIYI-VID-2019. The accuracy achieves 56.13 % on the fine-grained dataset MovieGraphs for relation extraction task, while the person recognition of which has values 89.31 % and 85.24% on accuracy and mAP. The experimental results show that our proposed method has better performance than the state-of-the-art methods.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122046670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Personalized Recommendation Based On Entity Attributes and Graph Features","authors":"Yi Zhu, Bingbing Dong, Zhiqing Sha","doi":"10.1109/ICKG52313.2021.00011","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00011","url":null,"abstract":"With the rapid increase in the amount of website data, it has been a more difficult task for users to get the infor-mation they are interested in. Personalized recommendation is an important bridge to find the information which users really need on the website. Many recent studies have introduced additional attribute information about users and/or items to the rating matrix for alleviating the problem of data sparsity. In order to make full use of the attribute information and scoring matrix, deep learning based recommendation methods are proposed, especially the autoencoder model has attracted much attention because of its strong ability to learn hidden features. However, most of the existing autoencoder- based models require that the dimension of the input layer is equal to the dimension of the output layer, which may increase model complexity and certain information loss when using attribute information. In addition, as users' awareness of privacy protection increases, user attribute information is difficult to obtain. To address the above problems, in this paper, we propose a hybrid personalized recommendation model, which uses a semi-autoencoder to jointly embed the item's score vector and internal graph features (short for Co-Agpre). Specifically, we regard the user-item historical interaction matrix as a bipartite graph, and the Laplacian of the user-item co-occurrence graph is utilized to obtain the graph features of the item for solving the problem of sparse attributes. Then a semi-autoencoder is introduced to learn the hidden features of the item and perform rating prediction. The proposed model can flexibly use information from different sources to reduce the complexity of the model. Experiments on two real-world datasets demonstrate the effectiveness of the proposed Co-Agpre compared with state-of-the-art methods.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130725767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shujing Ke, P. Spronck, B. Goertzel, Alex Van der Peet
{"title":"Surprisingness - A Novel Objective Interestingness Measure in Hypergraph Pattern Mining from Knowledge Graphs for Common Sense Learning","authors":"Shujing Ke, P. Spronck, B. Goertzel, Alex Van der Peet","doi":"10.1109/ICKG52313.2021.00017","DOIUrl":"https://doi.org/10.1109/ICKG52313.2021.00017","url":null,"abstract":"Pattern mining usually results in huge amounts of patterns, among which only small percentages are interesting. In this paper, Surprisingness (including Surpringness_I and Surpringness_II) is proposed as an innovative objective multivariate interestingness measure for automatically identifying interesting patterns from a large quantity of patterns. Surprisingness is applicable in unstructured or semi-structured, multi-domain or mixed-domain data compared to existing measures. An experiment has been conducted enabling unsupervised learning of common sense, interesting patterns and exceptions from a knowledge graph database built from Wikipedia 1 extracted data (represented as directed labeled hypergraphs), using Surpringness.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126543680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}