Joint Learning for Document-Level Threat Intelligence Relation Extraction and Coreference Resolution Based on GCN

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) Pub Date : 2020-12-01 DOI:10.1109/TrustCom50675.2020.00083

Xuren Wang, Mengbo Xiong, Yali Luo, Ning Li, Zhengwei Jiang, Zihan Xiong

{"title":"Joint Learning for Document-Level Threat Intelligence Relation Extraction and Coreference Resolution Based on GCN","authors":"Xuren Wang, Mengbo Xiong, Yali Luo, Ning Li, Zhengwei Jiang, Zihan Xiong","doi":"10.1109/TrustCom50675.2020.00083","DOIUrl":null,"url":null,"abstract":"In order to help researchers quickly understand the connection between new threat events and previous threat events, threat intelligence document-level relation extraction plays a very important role in threat intelligence text analysis and processing. Because there is no public document-level threat intelligence dataset, we create APTERC-DOC, an APT intelligence entities, relations and coreference dataset. We treat the relation extraction as a multi-classification task. Treating the coreference relation as a kind of predefined relations, we develop a joint learning framework called TIRECO, a model which can simultaneously complete threat intelligence relation extraction and coreference resolution. In order to solve the problem of document-level text being too long to extract feature, we propose the concept of sentence set, which transforms document-level relation extraction into inter-sentence relation extraction. To incorporate relevant information with maximally removing irrelevant content in sentence set, we further apply a novel pruning strategy (SDP-VP-SET) to the input trees considering that verbs are crucial in determining the relation between entities in sentence set. With retaining the shortest path and nodes that are K hops away from the shortest path, we give the edge connected to the verb nodes a weight of w times. Experimental results show that our model not only performs well in the extraction of inter-sentence relations, it is also effective in intra-sentence relations, and the F1 value has increased by 15.694%.","PeriodicalId":221956,"journal":{"name":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TrustCom50675.2020.00083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

In order to help researchers quickly understand the connection between new threat events and previous threat events, threat intelligence document-level relation extraction plays a very important role in threat intelligence text analysis and processing. Because there is no public document-level threat intelligence dataset, we create APTERC-DOC, an APT intelligence entities, relations and coreference dataset. We treat the relation extraction as a multi-classification task. Treating the coreference relation as a kind of predefined relations, we develop a joint learning framework called TIRECO, a model which can simultaneously complete threat intelligence relation extraction and coreference resolution. In order to solve the problem of document-level text being too long to extract feature, we propose the concept of sentence set, which transforms document-level relation extraction into inter-sentence relation extraction. To incorporate relevant information with maximally removing irrelevant content in sentence set, we further apply a novel pruning strategy (SDP-VP-SET) to the input trees considering that verbs are crucial in determining the relation between entities in sentence set. With retaining the shortest path and nodes that are K hops away from the shortest path, we give the edge connected to the verb nodes a weight of w times. Experimental results show that our model not only performs well in the extraction of inter-sentence relations, it is also effective in intra-sentence relations, and the F1 value has increased by 15.694%.

查看原文本刊更多论文

基于GCN的联合学习文档级威胁情报关系提取与关联解析

为了帮助研究人员快速了解新的威胁事件和以前的威胁事件之间的联系，威胁情报文档级关系提取在威胁情报文本分析和处理中起着非常重要的作用。由于没有公开的文档级威胁情报数据集，我们创建了APTERC-DOC，一个APT情报实体、关系和共同参考数据集。我们把关系抽取看作是一个多分类的任务。将共参考关系视为一种预定义关系，开发了TIRECO联合学习框架，该模型可以同时完成威胁情报关系提取和共参考解决。为了解决文档级文本太长而无法提取特征的问题，我们提出了句子集的概念，将文档级关系提取转化为句子间关系提取。考虑到动词在决定句子集中实体之间的关系中起着至关重要的作用，我们进一步对输入树应用了一种新的剪枝策略(SDP-VP-SET)，以在最大程度上去除句子集中不相关的内容的同时整合相关信息。在保留最短路径和距离最短路径K跳的节点的情况下，我们给连接到动词节点的边赋予w次权值。实验结果表明，我们的模型不仅在句间关系提取方面表现良好，在句内关系提取方面也很有效，F1值提高了15.694%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

自引率

0.00%

发文量