Using common-sense knowledge-base for detecting word obfuscation in adversarial communication

2015 7th International Conference on Communication Systems and Networks (COMSNETS) Pub Date : 2015-01-01 DOI:10.1109/COMSNETS.2015.7098738

Swati Agarwal, A. Sureka

{"title":"Using common-sense knowledge-base for detecting word obfuscation in adversarial communication","authors":"Swati Agarwal, A. Sureka","doi":"10.1109/COMSNETS.2015.7098738","DOIUrl":null,"url":null,"abstract":"Word obfuscation or substitution means replacing one word with another word in a sentence to conceal the textual content or communication. Word obfuscation is used in adversarial communication by terrorist or criminals for conveying their messages without getting red-flagged by security and intelligence agencies intercepting or scanning messages (such as emails and telephone conversations). ConceptNet is a freely available semantic network represented as a directed graph consisting of nodes as concepts and edges as assertions of common sense about these concepts. We present a solution approach exploiting vast amount of semantic knowledge in ConceptNet for addressing the technically challenging problem of word substitution in adversarial communication. We frame the given problem as a textual reasoning and context inference task and utilize ConceptNet's natural-language-processing tool-kit for determining word substitution. We use ConceptNet to compute the conceptual similarity between any two given terms and define a Mean Average Conceptual Similarity (MACS) metric to identify out-of-context terms. The test-bed to evaluate our proposed approach consists of Enron email dataset (having over 600000 emails generated by 158 employees of Enron Corporation) and Brown corpus (totaling about a million words drawn from a wide variety of sources). We implement word substitution techniques used by previous researches to generate a test dataset.We conduct a series of experiments consisting of word substitution methods used in the past to evaluate our approach. Experimental results reveal that the proposed approach is effective.","PeriodicalId":277593,"journal":{"name":"2015 7th International Conference on Communication Systems and Networks (COMSNETS)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 7th International Conference on Communication Systems and Networks (COMSNETS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMSNETS.2015.7098738","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Word obfuscation or substitution means replacing one word with another word in a sentence to conceal the textual content or communication. Word obfuscation is used in adversarial communication by terrorist or criminals for conveying their messages without getting red-flagged by security and intelligence agencies intercepting or scanning messages (such as emails and telephone conversations). ConceptNet is a freely available semantic network represented as a directed graph consisting of nodes as concepts and edges as assertions of common sense about these concepts. We present a solution approach exploiting vast amount of semantic knowledge in ConceptNet for addressing the technically challenging problem of word substitution in adversarial communication. We frame the given problem as a textual reasoning and context inference task and utilize ConceptNet's natural-language-processing tool-kit for determining word substitution. We use ConceptNet to compute the conceptual similarity between any two given terms and define a Mean Average Conceptual Similarity (MACS) metric to identify out-of-context terms. The test-bed to evaluate our proposed approach consists of Enron email dataset (having over 600000 emails generated by 158 employees of Enron Corporation) and Brown corpus (totaling about a million words drawn from a wide variety of sources). We implement word substitution techniques used by previous researches to generate a test dataset.We conduct a series of experiments consisting of word substitution methods used in the past to evaluate our approach. Experimental results reveal that the proposed approach is effective.

查看原文本刊更多论文

基于常识知识库的对抗性交际词混淆检测

词语混淆或替代是指用句子中的一个词代替另一个词来掩盖文本内容或交流。单词混淆是恐怖分子或犯罪分子在对抗性通信中使用的，目的是传达他们的信息，而不会被安全和情报机构拦截或扫描信息(如电子邮件和电话交谈)。ConceptNet是一个免费可用的语义网络，表示为一个有向图，由节点作为概念和边缘作为关于这些概念的常识断言组成。我们提出了一种利用概念网中大量语义知识的解决方案，以解决对抗性通信中具有技术挑战性的词替换问题。我们将给定的问题框架为文本推理和上下文推理任务，并利用ConceptNet的自然语言处理工具包来确定单词替换。我们使用ConceptNet来计算任意两个给定术语之间的概念相似性，并定义了一个平均概念相似性(MACS)度量来识别上下文外的术语。评估我们提出的方法的测试平台由安然电子邮件数据集(由安然公司的158名员工生成的60多万封电子邮件)和Brown语料库(从各种来源提取的总计约100万字)组成。我们实现了以前研究中使用的词替换技术来生成测试数据集。我们进行了一系列的实验，包括过去使用的单词替换方法来评估我们的方法。实验结果表明，该方法是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 7th International Conference on Communication Systems and Networks (COMSNETS)

自引率

0.00%

发文量