Mixed-code text analysis for the detection of online hidden propaganda

Proceedings of the 15th International Conference on Availability, Reliability and Security Pub Date : 2020-08-25 DOI:10.1145/3407023.3409211

A. Tundis, G. Mukherjee, M. Mühlhäuser

{"title":"Mixed-code text analysis for the detection of online hidden propaganda","authors":"A. Tundis, G. Mukherjee, M. Mühlhäuser","doi":"10.1145/3407023.3409211","DOIUrl":null,"url":null,"abstract":"Internet-based communication systems have become an increasing tool for spreading misinformation and propaganda. Though mechanisms adept in tracking unwarranted information and messages exist, users have devised different methods to avoid scrutiny and detection. One of such method is the use of mixed-code language. Mixed code is text written in an unconventional form combining different languages, symbols, scripts and shapes, with the aim to make it difficult to detect due to its custom approach and its ever changing aspects. Utilizing special characters to substitute for alphabets, which makes it readable to humans but nonsensical to machine. The intuition is that a substituted alphabet should resemble the shape of the intended alphabet. In this context, the paper explores the possibility of identifying such mixed code texts with special characters by proposing an approach to normalize them and determine if it contains propaganda elements. As a consequence, a tailored algorithm in combination with a deep learning models for character selection is defined and presented. The results gathered from its experimentation are discussed and the achieved performances are compared with the related works1.","PeriodicalId":121225,"journal":{"name":"Proceedings of the 15th International Conference on Availability, Reliability and Security","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th International Conference on Availability, Reliability and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3407023.3409211","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Internet-based communication systems have become an increasing tool for spreading misinformation and propaganda. Though mechanisms adept in tracking unwarranted information and messages exist, users have devised different methods to avoid scrutiny and detection. One of such method is the use of mixed-code language. Mixed code is text written in an unconventional form combining different languages, symbols, scripts and shapes, with the aim to make it difficult to detect due to its custom approach and its ever changing aspects. Utilizing special characters to substitute for alphabets, which makes it readable to humans but nonsensical to machine. The intuition is that a substituted alphabet should resemble the shape of the intended alphabet. In this context, the paper explores the possibility of identifying such mixed code texts with special characters by proposing an approach to normalize them and determine if it contains propaganda elements. As a consequence, a tailored algorithm in combination with a deep learning models for character selection is defined and presented. The results gathered from its experimentation are discussed and the achieved performances are compared with the related works1.

查看原文本刊更多论文

混合代码文本分析用于在线隐藏宣传的检测

基于互联网的通信系统日益成为传播错误信息和宣传的工具。虽然存在跟踪不可靠信息和消息的机制，但用户设计了不同的方法来避免审查和检测。其中一种方法是使用混合代码语言。混合代码是以一种非常规的形式编写的文本，结合了不同的语言、符号、脚本和形状，目的是使其难以检测，因为它的自定义方法和不断变化的方面。使用特殊字符代替字母，这使得它对人类来说是可读的，但对机器来说是无意义的。人们的直觉是，替换的字母表应该与预期字母表的形状相似。在此背景下，本文通过提出一种方法来规范化这些带有特殊字符的混合代码文本，并确定其是否包含宣传元素，从而探讨识别这些混合代码文本的可能性。因此，定义并提出了一种结合深度学习模型的定制算法，用于字符选择。对实验结果进行了讨论，并与相关工作进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 15th International Conference on Availability, Reliability and Security

自引率

0.00%

发文量