Mixed-code text analysis for the detection of online hidden propaganda

A. Tundis, G. Mukherjee, M. Mühlhäuser
{"title":"Mixed-code text analysis for the detection of online hidden propaganda","authors":"A. Tundis, G. Mukherjee, M. Mühlhäuser","doi":"10.1145/3407023.3409211","DOIUrl":null,"url":null,"abstract":"Internet-based communication systems have become an increasing tool for spreading misinformation and propaganda. Though mechanisms adept in tracking unwarranted information and messages exist, users have devised different methods to avoid scrutiny and detection. One of such method is the use of mixed-code language. Mixed code is text written in an unconventional form combining different languages, symbols, scripts and shapes, with the aim to make it difficult to detect due to its custom approach and its ever changing aspects. Utilizing special characters to substitute for alphabets, which makes it readable to humans but nonsensical to machine. The intuition is that a substituted alphabet should resemble the shape of the intended alphabet. In this context, the paper explores the possibility of identifying such mixed code texts with special characters by proposing an approach to normalize them and determine if it contains propaganda elements. As a consequence, a tailored algorithm in combination with a deep learning models for character selection is defined and presented. The results gathered from its experimentation are discussed and the achieved performances are compared with the related works1.","PeriodicalId":121225,"journal":{"name":"Proceedings of the 15th International Conference on Availability, Reliability and Security","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th International Conference on Availability, Reliability and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3407023.3409211","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Internet-based communication systems have become an increasing tool for spreading misinformation and propaganda. Though mechanisms adept in tracking unwarranted information and messages exist, users have devised different methods to avoid scrutiny and detection. One of such method is the use of mixed-code language. Mixed code is text written in an unconventional form combining different languages, symbols, scripts and shapes, with the aim to make it difficult to detect due to its custom approach and its ever changing aspects. Utilizing special characters to substitute for alphabets, which makes it readable to humans but nonsensical to machine. The intuition is that a substituted alphabet should resemble the shape of the intended alphabet. In this context, the paper explores the possibility of identifying such mixed code texts with special characters by proposing an approach to normalize them and determine if it contains propaganda elements. As a consequence, a tailored algorithm in combination with a deep learning models for character selection is defined and presented. The results gathered from its experimentation are discussed and the achieved performances are compared with the related works1.
混合代码文本分析用于在线隐藏宣传的检测
基于互联网的通信系统日益成为传播错误信息和宣传的工具。虽然存在跟踪不可靠信息和消息的机制,但用户设计了不同的方法来避免审查和检测。其中一种方法是使用混合代码语言。混合代码是以一种非常规的形式编写的文本,结合了不同的语言、符号、脚本和形状,目的是使其难以检测,因为它的自定义方法和不断变化的方面。使用特殊字符代替字母,这使得它对人类来说是可读的,但对机器来说是无意义的。人们的直觉是,替换的字母表应该与预期字母表的形状相似。在此背景下,本文通过提出一种方法来规范化这些带有特殊字符的混合代码文本,并确定其是否包含宣传元素,从而探讨识别这些混合代码文本的可能性。因此,定义并提出了一种结合深度学习模型的定制算法,用于字符选择。对实验结果进行了讨论,并与相关工作进行了比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信