{"title":"Entity and attribute extraction of terrorism event based on text corpus","authors":"曹文斌, 武卓峰, 杨涛, 凡友荣","doi":"10.13374/J.ISSN2095-9389.2019.09.13.003","DOIUrl":null,"url":null,"abstract":"Affected by complex international factors in recent years, terrorism events are increasingly rampant in many countries,thereby posing a great threat to the gloal community. In addition, with the widespread use of emerging technologies in military and commercial fields, terrorist organizations have begun to use emerging technologies to engage in destructive activities. As the Internet and information technology develop, terrorism has been rapidly spreading in cyberspace. Terrorist organizations have created terrorism websites, established multinational networks of terrorist organizations, released recruitment information and even conducted training activities through various mainstream websites with a worldwide reach. Compared with traditional terrorist activities, cyber terrorist activities have a greater degree of destructiveness. Cybercrime and cyber terrorism have become the most serious challenges for societies. Terrorist organizations take advantage of the Internet in rapid dissemination of extremism ideas, and develop a large number of terrorists and supporters around the world, especially in developed Western countries. Terrorist organizations even use the Internet and\"dark net\" networks to conduct terrorist training, and their activities are concealed. As a result, the \"lone wolf\" terrorist attacks in various countries have emerged in an endless stream, which is difficult to prevent. This study proposed a method of extracting entities and attributes of terrorist events based on semantic role analysis, and provided technical support for monitoring and predicting cyberspace terrorism activities. Firstly, a naive Bayesian text classification algorithm is used to identify terrorism events on the cleaned text corpus collected from the Anti-Terrorism Information Site of the Northwest University of Political Science and Law.The keyword extraction algorithm TF-IDF is adopted for constructing the terrorism vocabularies from the classified text corpus,combining natural language processing technology.Then,semantic role and syntactic dependency analyses are conducted to mine the attributive posttargeting relationship,the name//place name//organization,and the mediator-like relationship.Finally,regular expressions and constructed lexical terrorism-specific vocabularies are used to extract six entities and attributes(occurrence time,occurrence location,casualties,attack methods,weapon types and terrorist organizations)of terrorism event based on the four types of triad short texts.The F1 values of the six types of entity attribute extraction evaluation results exceeded 80%based on the experimental data of 4221 articles collected.Therefore,the method proposed has practical significance for maintaining social public safety because of the positive effect in monitoring and predicting cyberspace terrorism events.","PeriodicalId":31263,"journal":{"name":"工程设计学报","volume":"60 1","pages":"500-508"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"工程设计学报","FirstCategoryId":"1087","ListUrlMain":"https://doi.org/10.13374/J.ISSN2095-9389.2019.09.13.003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 0
Abstract
Affected by complex international factors in recent years, terrorism events are increasingly rampant in many countries,thereby posing a great threat to the gloal community. In addition, with the widespread use of emerging technologies in military and commercial fields, terrorist organizations have begun to use emerging technologies to engage in destructive activities. As the Internet and information technology develop, terrorism has been rapidly spreading in cyberspace. Terrorist organizations have created terrorism websites, established multinational networks of terrorist organizations, released recruitment information and even conducted training activities through various mainstream websites with a worldwide reach. Compared with traditional terrorist activities, cyber terrorist activities have a greater degree of destructiveness. Cybercrime and cyber terrorism have become the most serious challenges for societies. Terrorist organizations take advantage of the Internet in rapid dissemination of extremism ideas, and develop a large number of terrorists and supporters around the world, especially in developed Western countries. Terrorist organizations even use the Internet and"dark net" networks to conduct terrorist training, and their activities are concealed. As a result, the "lone wolf" terrorist attacks in various countries have emerged in an endless stream, which is difficult to prevent. This study proposed a method of extracting entities and attributes of terrorist events based on semantic role analysis, and provided technical support for monitoring and predicting cyberspace terrorism activities. Firstly, a naive Bayesian text classification algorithm is used to identify terrorism events on the cleaned text corpus collected from the Anti-Terrorism Information Site of the Northwest University of Political Science and Law.The keyword extraction algorithm TF-IDF is adopted for constructing the terrorism vocabularies from the classified text corpus,combining natural language processing technology.Then,semantic role and syntactic dependency analyses are conducted to mine the attributive posttargeting relationship,the name//place name//organization,and the mediator-like relationship.Finally,regular expressions and constructed lexical terrorism-specific vocabularies are used to extract six entities and attributes(occurrence time,occurrence location,casualties,attack methods,weapon types and terrorist organizations)of terrorism event based on the four types of triad short texts.The F1 values of the six types of entity attribute extraction evaluation results exceeded 80%based on the experimental data of 4221 articles collected.Therefore,the method proposed has practical significance for maintaining social public safety because of the positive effect in monitoring and predicting cyberspace terrorism events.
期刊介绍:
Chinese Journal of Engineering Design is a reputable journal published by Zhejiang University Press Co., Ltd. It was founded in December, 1994 as the first internationally cooperative journal in the area of engineering design research. Administrated by the Ministry of Education of China, it is sponsored by both Zhejiang University and Chinese Society of Mechanical Engineering. Zhejiang University Press Co., Ltd. is fully responsible for its bimonthly domestic and oversea publication. Its page is in A4 size. This journal is devoted to reporting most up-to-date achievements of engineering design researches and therefore, to promote the communications of academic researches and their applications to industry. Achievments of great creativity and practicablity are extraordinarily desirable. Aiming at supplying designers, developers and researchers of diversified technical artifacts with valuable references, its content covers all aspects of design theory and methodology, as well as its enabling environment, for instance, creative design, concurrent design, conceptual design, intelligent design, web-based design, reverse engineering design, industrial design, design optimization, tribology, design by biological analogy, virtual reality in design, structural analysis and design, design knowledge representation, design knowledge management, design decision-making systems, etc.