{"title":"Mixed-code text analysis for the detection of online hidden propaganda","authors":"A. Tundis, G. Mukherjee, M. Mühlhäuser","doi":"10.1145/3407023.3409211","DOIUrl":null,"url":null,"abstract":"Internet-based communication systems have become an increasing tool for spreading misinformation and propaganda. Though mechanisms adept in tracking unwarranted information and messages exist, users have devised different methods to avoid scrutiny and detection. One of such method is the use of mixed-code language. Mixed code is text written in an unconventional form combining different languages, symbols, scripts and shapes, with the aim to make it difficult to detect due to its custom approach and its ever changing aspects. Utilizing special characters to substitute for alphabets, which makes it readable to humans but nonsensical to machine. The intuition is that a substituted alphabet should resemble the shape of the intended alphabet. In this context, the paper explores the possibility of identifying such mixed code texts with special characters by proposing an approach to normalize them and determine if it contains propaganda elements. As a consequence, a tailored algorithm in combination with a deep learning models for character selection is defined and presented. The results gathered from its experimentation are discussed and the achieved performances are compared with the related works1.","PeriodicalId":121225,"journal":{"name":"Proceedings of the 15th International Conference on Availability, Reliability and Security","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th International Conference on Availability, Reliability and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3407023.3409211","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Internet-based communication systems have become an increasing tool for spreading misinformation and propaganda. Though mechanisms adept in tracking unwarranted information and messages exist, users have devised different methods to avoid scrutiny and detection. One of such method is the use of mixed-code language. Mixed code is text written in an unconventional form combining different languages, symbols, scripts and shapes, with the aim to make it difficult to detect due to its custom approach and its ever changing aspects. Utilizing special characters to substitute for alphabets, which makes it readable to humans but nonsensical to machine. The intuition is that a substituted alphabet should resemble the shape of the intended alphabet. In this context, the paper explores the possibility of identifying such mixed code texts with special characters by proposing an approach to normalize them and determine if it contains propaganda elements. As a consequence, a tailored algorithm in combination with a deep learning models for character selection is defined and presented. The results gathered from its experimentation are discussed and the achieved performances are compared with the related works1.