{"title":"Relational Facts Extraction with Splitting Mechanism","authors":"Yunzhou Shi, Yujiu Yang","doi":"10.1109/ICBK50248.2020.00060","DOIUrl":"https://doi.org/10.1109/ICBK50248.2020.00060","url":null,"abstract":"Relational fact extraction is aimed to extract triples from sentences. Recent years, Sequence-to-sequence learning has been utilized for this task because of its advantage of modeling three different entity overlapping types. In their model, they utilized the same RNN cell to decode entities and relation in a triplet. Actually, the information required to predict entities and relation are different. So we shouldn’t mix the process of extracting entities from the original sentence and predicting the relation between entities. Based on the above observation, we propose a novel mechanism to split the process of decoding entities and relation. We conduct extensive experiments on NYT and WebNLG datasets. The experimental results show that our Splitting-Mechainsm (SM) helps to promote performance.","PeriodicalId":432857,"journal":{"name":"2020 IEEE International Conference on Knowledge Graph (ICKG)","volume":"55 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117170924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Outlier Detection via Kernel Preserving Embedding and Random Walk","authors":"Enhui Li, Huawen Liu, Kaile Su, Shichao Zhang","doi":"10.1109/ICBK50248.2020.00013","DOIUrl":"https://doi.org/10.1109/ICBK50248.2020.00013","url":null,"abstract":"Since outlier detection has a wide range of potential applications, it has become a fundamental and hot research topic in data mining. Recently, the technique of self-representation has attracted extensive attention and many low-rank representation based outlier detection algorithms have been witnessed. However, most of them pay more attention to minimize the reconstruction error of the data, without involving the manifold structure. Meanwhile, as a low-rank constraint, the single nuclear norm often leads to suboptimal solution. To alleviate these problems, in this paper, we propose a novel outlier detection method, which adopts kernel-based distance to retain the overall relations. Moreover, the double nuclear norm is exploited to address the suboptimal problem. Further, a tailored random walk is used to identify outliers, after the similarity relations of the data available. The extensive simulation experiments on five public datasets demonstrate the superiority of the proposed method in comparing to the state-of-the-art outlier detection algorithms.","PeriodicalId":432857,"journal":{"name":"2020 IEEE International Conference on Knowledge Graph (ICKG)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133725996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yiming Zhang, Yujie Fan, Shifu Hou, Yanfang Ye, Xusheng Xiao, P. Li, C. Shi, Liang Zhao, Shouhuai Xu
{"title":"Cyber-guided Deep Neural Network for Malicious Repository Detection in GitHub","authors":"Yiming Zhang, Yujie Fan, Shifu Hou, Yanfang Ye, Xusheng Xiao, P. Li, C. Shi, Liang Zhao, Shouhuai Xu","doi":"10.1109/ICBK50248.2020.00071","DOIUrl":"https://doi.org/10.1109/ICBK50248.2020.00071","url":null,"abstract":"As the largest source code repository, GitHub has played a vital role in modern social coding ecosystem to generate production software. Despite the apparent benefits of such social coding paradigm, its potential security risks have been largely overlooked (e.g., malicious codes or repositories could be easily embedded and distributed). To address this imminent issue, in this paper, we propose a novel framework (named GitCyber) to automate malicious repository detection in GitHub at the first attempt. In GitCyber, we first extract code contents from the repositories hosted in GitHub as the inputs for deep neural network (DNN), and then we incorporate cybersecurity domain knowledge modeled by heterogeneous information network (HIN) to design cyber-guided loss function in the learning objective of the DNN to assure the classification performance while preserving consistency with the observational domain knowledge. Comprehensive experiments based on the large-scale data collected from GitHub demonstrate that our proposed GitCyber outperforms the state-of-the-arts in malicious repository detection.","PeriodicalId":432857,"journal":{"name":"2020 IEEE International Conference on Knowledge Graph (ICKG)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126957146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BERT-BIGRU-CRF: A Novel Entity Relationship Extraction Model","authors":"Jianghai Lv, Junping Du, Nan Zhou, Zhe Xue","doi":"10.1109/ICBK50248.2020.00032","DOIUrl":"https://doi.org/10.1109/ICBK50248.2020.00032","url":null,"abstract":"Entity name recognition and entity relationship extraction are the most critical foundation for building knowledge graph, and it is also the basic task of NPL. The main purpose of entity relationship extraction is to extract the semantic relationship between the pairs of marked entities in the sentence, that is, to determine the relationship categories between entity pairs in unstructured text based on entity identification, and to form structured data for storage and retrieval. This paper proposes a BERT-BIGRU-CRF entity relationship extraction method, which effectively changes the relationship between the pre-training generated word vector and the downstream specific NLP task, and gradually moves the downstream specific NLP task to the pre-training generated word vector. Our method achieves better performance of relationship extraction and entity name recognition, which helps to construct the knowledge graph more accurately.","PeriodicalId":432857,"journal":{"name":"2020 IEEE International Conference on Knowledge Graph (ICKG)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122952455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SDSK2BERT: Explore the Specific Depth with Specific Knowledge to Compress BERT","authors":"Lifang Ding, Yujiu Yang","doi":"10.1109/ICBK50248.2020.00066","DOIUrl":"https://doi.org/10.1109/ICBK50248.2020.00066","url":null,"abstract":"The success of a pretraining model like BERT in Natural Language Processing (NLP) puts forward the demand for model compression. Previous works adopting knowledge distillation (KD) to compress BERT are conducted with fixed depth, thus the problem of over-parameterization is not fully explored without answering the appropriate depth for a specific data set. In this work, we take two data sets of Natural Language Inference (NLI) with different difficulty levels as examples to answer the question of layer numbers. During the exploration of depth, we use the learned dataset-specific weights to warm up the networks in the next run, making the model find a better local optimum. With 1%~2% drops on the accuracy, our method reduces the 12-layer BERT model to 6-layer on the MNLI-matched dataset and 2-layer on the DNLI dataset, which not only reduces the parameters to 1/2x and 1/6x respectively but also outperforms the general knowledge distillation framework by about 1% accuracy. What’s more, we explain why and when our framework works with the help of visualization.","PeriodicalId":432857,"journal":{"name":"2020 IEEE International Conference on Knowledge Graph (ICKG)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123384028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dandan Song, Jingwen Gao, Jinhui Pang, L. Liao, Lifei Qin
{"title":"Knowledge Base Enhanced Topic Modeling","authors":"Dandan Song, Jingwen Gao, Jinhui Pang, L. Liao, Lifei Qin","doi":"10.1109/ICBK50248.2020.00061","DOIUrl":"https://doi.org/10.1109/ICBK50248.2020.00061","url":null,"abstract":"Topic models, such as Latent Dirichlet Allocation (LDA), are successful in learning hidden topics and has been widely applied in text mining. There are many recently developed augmented topic modeling methods to utilize metadata information. However, the effect of topic models is still not comparable to humans. We think one key point is that humans have background knowledge, which is essential for topic understanding. Inspired by this, we propose a knowledge base enhanced topic model in this paper. We take knowledge bases as good presentations of human knowledge, with huge collections of entities and their relations. We assume that documents with related entities tend to have similar topic distributions. Based on this assumption, we compute document similarity information via the linked entities and then use it as a constraint for LDA. More specifically, we embed entities in a low-dimensional space via DeepWalk and use Entity Movers Distance to efficiently and effectively measure the similarities between documents. The results of experiments over two real-world datasets show that our method boosts the LDA model on the document classification while no supervision information is needed.","PeriodicalId":432857,"journal":{"name":"2020 IEEE International Conference on Knowledge Graph (ICKG)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121611534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Two-Level Noise-Tolerant Model for Relation Extraction with Reinforcement Learning","authors":"Erxin Yu, Yantao Jia, Yuan Tian, Yi Chang","doi":"10.1109/ICBK50248.2020.00059","DOIUrl":"https://doi.org/10.1109/ICBK50248.2020.00059","url":null,"abstract":"Distant supervision has been widely used to automatically label data for relation extraction, but inevitably suffers from wrong labeling problems. Existing methods solve the noisy problem by merely focusing on one aspect, either at the sentencelevel or the bag-level. However, none consider the two levels as a whole. In this paper, we propose a deep reinforcement learning model to solve the noisy problem at both the bag level and the sentence level. For a bag, i.e., a set of sentences containing the same pair of entities, the sentence-level extractor serves as an agent which predicts the label for each sentence, and then determines the label for the bag. The bag-level extractor provides a delayed reward to the agent, and iteratively promotes its performance. The experimental results show that our two-level denoising model effectively improves the performance of distant supervision relation extraction compared to previous methods.","PeriodicalId":432857,"journal":{"name":"2020 IEEE International Conference on Knowledge Graph (ICKG)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132503158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamically Jointing character and word embedding for Chinese text Classification","authors":"Xuetao Tang, Xuegang Hu, Peipei Li","doi":"10.1109/ICBK50248.2020.00055","DOIUrl":"https://doi.org/10.1109/ICBK50248.2020.00055","url":null,"abstract":"Chinese text classification is drawing attention in these few years. Different from English texts, there is no natural separator between Chinese words. With the development of deep learning, many character-level only models have been proposed for Chinese text classification to tackle this problem, which have achieved more success than word-level models. But the word information is also important for Chinese text representation, especially for short texts with less information. However, most of neural network models either just concatenate character-level representation and word-level representation, or use massive external knowledge to represent the whole text, which is complex and time-consuming. For better and easier representing the Chinese text without any external knowledge and using as much character and word information as possible, we propose a simple model jointed character and word embedding dynamically, called DJCW. Firstly, the character-level and word-level BiLSTM Model is introduced to extract features of texts with indefinite lengths. Secondly, the char and word are weightedly combined and the weights are changed dynamically. Finally, experiments conducted on five open-source text datasets show our model can handle the texts with different lengths and has achieved good stability results.","PeriodicalId":432857,"journal":{"name":"2020 IEEE International Conference on Knowledge Graph (ICKG)","volume":"63 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132531556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ICKG 2020 TOC","authors":"Ickg, Youquan Wang, Shengjie Hu","doi":"10.1109/icbk50248.2020.00004","DOIUrl":"https://doi.org/10.1109/icbk50248.2020.00004","url":null,"abstract":"Integrating Bi-Dynamic Routing Capsule Network with Label-Constraint for Text Classification 4 Xiang Guo (Nanjing University of Finance and Economics), Youquan Wang (Nanjing University of Finance and Economics), Kaiyuan Gao (Xi’an Jiaotong-Liverpool University), Jie Cao (Nanjing University of Finance and Economics), Haicheng Tao (Nanjing University of Science and Technology), and Chaoyue Chen (Nanjing University of Finance and Economics)","PeriodicalId":432857,"journal":{"name":"2020 IEEE International Conference on Knowledge Graph (ICKG)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121046704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Knowledge Graph for China’s Genealogy","authors":"Xindong Wu, Tingting Jiang, Yi Zhu, Chenyang Bu","doi":"10.1109/ICBK50248.2020.00080","DOIUrl":"https://doi.org/10.1109/ICBK50248.2020.00080","url":null,"abstract":"Genealogical knowledge graphs depict the relationships of family networks and the development of family histories. They can help researchers to analyze and understand genealogical data, search for genealogical roots, and explore the origins of a family more easily. However, the multi-type, multisource dynamic changes and specialized nature of genealogical data bring challenges to the development of contemporary knowledge graph models. Applying existing methods to genealogical data can result in problems of overlooking certain specialized vocabulary and dynamic properties such as personal experiences. In this paper, we propose a genealogical knowledge graph model GKGM that combines HAO intelligence (h uman intelligence + a rtificial intelligence + o rganizational intelligence) and ontology granularity division technology to address the above problems. Furthermore, a method of applying the model to construct genealogical knowledge graphs is demonstrated, and an experiment conducted on a real-world genealogical dataset verifies the feasibility and effectiveness of the model.","PeriodicalId":432857,"journal":{"name":"2020 IEEE International Conference on Knowledge Graph (ICKG)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129303310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}