Identification of Violence in Twitter Using a Custom Lexicon and NLP

European Conference on Cyber Warfare and Security Pub Date : 2022-06-08 DOI:10.34190/eccws.21.1.340

Jonathan Adkins

{"title":"Identification of Violence in Twitter Using a Custom Lexicon and NLP","authors":"Jonathan Adkins","doi":"10.34190/eccws.21.1.340","DOIUrl":null,"url":null,"abstract":"Information warfare is no longer a denizen purely of the political domain. It is a phenomenon that permeates other domains, especially those of mass communications and cybersecurity. Deepfakes, sock puppets, and microtargeted political advertising on social media are some examples of techniques that have been employed by threat actors to exert influence over consumers of mass media. Social Network Analysis (SNA) is an aggregation of tools and techniques used to research and analyze the nature of relationships between entities. SNA makes use of such tools as text mining, sentiment analysis, and machine learning algorithms to identify and measure aspects of human behavior in certain defined conditions. One area of interest in SNA is the ability to identify and measure levels of strong emotions in groups of people. In particular, we have developed a technique in which the potential for increased violence within a community can be identified and measured using a combination of text mining, sentiment analysis, and graph theory. We have compiled a custom lexicon of terms used commonly in discussions relating to acts of violence. Each term in the lexicon has a numerical weight associated with it, indicating how violent the term is. We will take samples of online community discussions from Twitter and use the R and Python programming languages to cross-reference the samples with our lexicon. The results will be displayed in a Twitter discussion graph where the user nodes are color-coded according to the overall level of violence that is inherent in the Tweet. This methodology will demonstrate which communities within an online social network discussion are more at risk for potentially violent behavior. We assert that when this approach is used in association with other NLP techniques such as word embeddings and sentiment analysis, it will provide cybersecurity and homeland security analysts with actionable threat intelligence.","PeriodicalId":258360,"journal":{"name":"European Conference on Cyber Warfare and Security","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Conference on Cyber Warfare and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34190/eccws.21.1.340","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Information warfare is no longer a denizen purely of the political domain. It is a phenomenon that permeates other domains, especially those of mass communications and cybersecurity. Deepfakes, sock puppets, and microtargeted political advertising on social media are some examples of techniques that have been employed by threat actors to exert influence over consumers of mass media. Social Network Analysis (SNA) is an aggregation of tools and techniques used to research and analyze the nature of relationships between entities. SNA makes use of such tools as text mining, sentiment analysis, and machine learning algorithms to identify and measure aspects of human behavior in certain defined conditions. One area of interest in SNA is the ability to identify and measure levels of strong emotions in groups of people. In particular, we have developed a technique in which the potential for increased violence within a community can be identified and measured using a combination of text mining, sentiment analysis, and graph theory. We have compiled a custom lexicon of terms used commonly in discussions relating to acts of violence. Each term in the lexicon has a numerical weight associated with it, indicating how violent the term is. We will take samples of online community discussions from Twitter and use the R and Python programming languages to cross-reference the samples with our lexicon. The results will be displayed in a Twitter discussion graph where the user nodes are color-coded according to the overall level of violence that is inherent in the Tweet. This methodology will demonstrate which communities within an online social network discussion are more at risk for potentially violent behavior. We assert that when this approach is used in association with other NLP techniques such as word embeddings and sentiment analysis, it will provide cybersecurity and homeland security analysts with actionable threat intelligence.

查看原文本刊更多论文

使用自定义词汇和NLP识别Twitter中的暴力行为

信息战不再是纯粹的政治领域。这种现象也渗透到其他领域，尤其是大众传播和网络安全领域。深度造假、袜子木偶和社交媒体上的微目标政治广告是威胁行为者用来对大众媒体消费者施加影响的一些技术例子。社会网络分析(SNA)是用于研究和分析实体之间关系本质的工具和技术的集合。SNA使用文本挖掘、情感分析和机器学习算法等工具来识别和测量特定条件下人类行为的各个方面。对SNA感兴趣的一个领域是识别和测量人群中强烈情绪水平的能力。特别是，我们开发了一种技术，可以使用文本挖掘、情感分析和图论的组合来识别和测量社区内暴力增加的可能性。我们编制了一个关于暴力行为的讨论中常用术语的自定义词汇。词典中的每个术语都有一个与之相关的数值权重，表示该术语的暴力程度。我们将从Twitter上获取在线社区讨论的示例，并使用R和Python编程语言将示例与我们的词典交叉引用。结果将显示在Twitter讨论图中，其中用户节点根据Tweet中固有的整体暴力程度进行颜色编码。该方法将展示在线社交网络讨论中的哪些社区更容易出现潜在的暴力行为。我们断言，当这种方法与其他NLP技术(如词嵌入和情感分析)结合使用时，它将为网络安全和国土安全分析师提供可操作的威胁情报。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

European Conference on Cyber Warfare and Security

自引率

0.00%

发文量