在社交媒体中发现极端主义网络的基于 NLP 的框架

IF 1.7 4区工程技术 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Complexity Pub Date : 2024-07-15 DOI:10.1155/2024/3380488

Andrés Zapata Rozo, Daniel Díaz-López, Javier Pastor-Galindo, Félix Gómez Mármol, Umit Karabiyik

{"title":"在社交媒体中发现极端主义网络的基于 NLP 的框架","authors":"Andrés Zapata Rozo, Daniel Díaz-López, Javier Pastor-Galindo, Félix Gómez Mármol, Umit Karabiyik","doi":"10.1155/2024/3380488","DOIUrl":null,"url":null,"abstract":"<div>\n <p>Governments and law enforcement agencies (LEAs) are increasingly concerned about growing illicit activities in cyberspace, such as cybercrimes, cyberespionage, cyberterrorism, and cyberwarfare. In the particular context of cyberterrorism, hostile social manipulation (HSM) represents a strategy that employs different manipulation methods, mostly through social media, to promote extremism in social groups and encourage hostile behavior against a target. Thus, this paper proposes a framework based on natural language processing (NLP) that detects and inspects supposed HSM actions to support law enforcement agencies (LEAs) in the prevention of cyberterrorism. The proposal integrates different NLP techniques through three models: (i) a similarity model that relates content with similar semantic meaning, (ii) a polarity analysis model that estimates polarity, and (iii) a named-entity recognition (NER) model that recognizes relevant entities. In addition, our proposed framework is evaluated in each of its components through exhaustive experiments and is tested with a particular use case related to violent protests in Ecuador in October 2021. Use case’s results indicate that 3 and 4 clusters are obtained when Spanish and English-translated tweets are used, respectively. An analysis of polarity over English-translated tweets allows us to identify, through two different methods, the most negative cluster (#1). The results of the extraction of the mentions show that our framework is able to identify entities of the type of person that may be at risk with a precision of 89.91%. Knowledge graphs achieved in our use case allow us to identify how nodes that promote HSM are interconnected and work collaboratively. Finally, the computational costs of our proposal are quite favorable as memory consumption of similarity and polarity models is proportional to the number of processed tweets, confirming the feasibility of the solution in a real context.</p>\n </div>","PeriodicalId":50653,"journal":{"name":"Complexity","volume":"2024 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1155/2024/3380488","citationCount":"0","resultStr":"{\"title\":\"An NLP-Based Framework to Spot Extremist Networks in Social Media\",\"authors\":\"Andrés Zapata Rozo, Daniel Díaz-López, Javier Pastor-Galindo, Félix Gómez Mármol, Umit Karabiyik\",\"doi\":\"10.1155/2024/3380488\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n <p>Governments and law enforcement agencies (LEAs) are increasingly concerned about growing illicit activities in cyberspace, such as cybercrimes, cyberespionage, cyberterrorism, and cyberwarfare. In the particular context of cyberterrorism, hostile social manipulation (HSM) represents a strategy that employs different manipulation methods, mostly through social media, to promote extremism in social groups and encourage hostile behavior against a target. Thus, this paper proposes a framework based on natural language processing (NLP) that detects and inspects supposed HSM actions to support law enforcement agencies (LEAs) in the prevention of cyberterrorism. The proposal integrates different NLP techniques through three models: (i) a similarity model that relates content with similar semantic meaning, (ii) a polarity analysis model that estimates polarity, and (iii) a named-entity recognition (NER) model that recognizes relevant entities. In addition, our proposed framework is evaluated in each of its components through exhaustive experiments and is tested with a particular use case related to violent protests in Ecuador in October 2021. Use case’s results indicate that 3 and 4 clusters are obtained when Spanish and English-translated tweets are used, respectively. An analysis of polarity over English-translated tweets allows us to identify, through two different methods, the most negative cluster (#1). The results of the extraction of the mentions show that our framework is able to identify entities of the type of person that may be at risk with a precision of 89.91%. Knowledge graphs achieved in our use case allow us to identify how nodes that promote HSM are interconnected and work collaboratively. Finally, the computational costs of our proposal are quite favorable as memory consumption of similarity and polarity models is proportional to the number of processed tweets, confirming the feasibility of the solution in a real context.</p>\\n </div>\",\"PeriodicalId\":50653,\"journal\":{\"name\":\"Complexity\",\"volume\":\"2024 1\",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2024-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1155/2024/3380488\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Complexity\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1155/2024/3380488\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complexity","FirstCategoryId":"5","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1155/2024/3380488","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

各国政府和执法机构（LEAs）越来越关注网络空间日益增多的非法活动，如网络犯罪、网络间谍、网络恐怖主义和网络战争。在网络恐怖主义的特殊背景下，敌对社交操纵（HSM）是一种策略，它采用不同的操纵方法，主要是通过社交媒体，在社会群体中宣扬极端主义，鼓励针对目标的敌对行为。因此，本文提出了一个基于自然语言处理（NLP）的框架，用于检测和检查假定的 HSM 行为，以支持执法机构（LEA）预防网络恐怖主义。该建议通过三个模型整合了不同的 NLP 技术：(i) 相似性模型，用于关联语义相似的内容；(ii) 极性分析模型，用于估计极性；(iii) 命名实体识别模型，用于识别相关实体。此外，我们还通过详尽的实验对所提出的框架的各个组成部分进行了评估，并用一个与 2021 年 10 月厄瓜多尔暴力抗议活动有关的特定用例进行了测试。使用案例的结果表明，在使用西班牙语和英语翻译的推文时，分别获得了 3 个和 4 个聚类。通过对英文翻译推文的极性分析，我们通过两种不同的方法确定了最负面的聚类（#1）。提取提及信息的结果表明，我们的框架能够识别出可能面临风险的人的实体类型，精确度高达 89.91%。在我们的用例中实现的知识图谱使我们能够识别促进 HSM 的节点是如何相互关联和协同工作的。最后，由于相似性和极性模型的内存消耗与处理的推文数量成正比，我们的建议的计算成本相当低，这证实了该解决方案在实际环境中的可行性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

An NLP-Based Framework to Spot Extremist Networks in Social Media

查看原文本刊更多论文

An NLP-Based Framework to Spot Extremist Networks in Social Media

Governments and law enforcement agencies (LEAs) are increasingly concerned about growing illicit activities in cyberspace, such as cybercrimes, cyberespionage, cyberterrorism, and cyberwarfare. In the particular context of cyberterrorism, hostile social manipulation (HSM) represents a strategy that employs different manipulation methods, mostly through social media, to promote extremism in social groups and encourage hostile behavior against a target. Thus, this paper proposes a framework based on natural language processing (NLP) that detects and inspects supposed HSM actions to support law enforcement agencies (LEAs) in the prevention of cyberterrorism. The proposal integrates different NLP techniques through three models: (i) a similarity model that relates content with similar semantic meaning, (ii) a polarity analysis model that estimates polarity, and (iii) a named-entity recognition (NER) model that recognizes relevant entities. In addition, our proposed framework is evaluated in each of its components through exhaustive experiments and is tested with a particular use case related to violent protests in Ecuador in October 2021. Use case’s results indicate that 3 and 4 clusters are obtained when Spanish and English-translated tweets are used, respectively. An analysis of polarity over English-translated tweets allows us to identify, through two different methods, the most negative cluster (#1). The results of the extraction of the mentions show that our framework is able to identify entities of the type of person that may be at risk with a precision of 89.91%. Knowledge graphs achieved in our use case allow us to identify how nodes that promote HSM are interconnected and work collaboratively. Finally, the computational costs of our proposal are quite favorable as memory consumption of similarity and polarity models is proportional to the number of processed tweets, confirming the feasibility of the solution in a real context.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Complexity 综合性期刊-数学跨学科应用

CiteScore

5.80

自引率

4.30%

发文量

595

审稿时长

>12 weeks

期刊介绍： Complexity is a cross-disciplinary journal focusing on the rapidly expanding science of complex adaptive systems. The purpose of the journal is to advance the science of complexity. Articles may deal with such methodological themes as chaos, genetic algorithms, cellular automata, neural networks, and evolutionary game theory. Papers treating applications in any area of natural science or human endeavor are welcome, and especially encouraged are papers integrating conceptual themes and applications that cross traditional disciplinary boundaries. Complexity is not meant to serve as a forum for speculation and vague analogies between words like “chaos,” “self-organization,” and “emergence” that are often used in completely different ways in science and in daily life.