{"title":"混合代码文本分析用于在线隐藏宣传的检测","authors":"A. Tundis, G. Mukherjee, M. Mühlhäuser","doi":"10.1145/3407023.3409211","DOIUrl":null,"url":null,"abstract":"Internet-based communication systems have become an increasing tool for spreading misinformation and propaganda. Though mechanisms adept in tracking unwarranted information and messages exist, users have devised different methods to avoid scrutiny and detection. One of such method is the use of mixed-code language. Mixed code is text written in an unconventional form combining different languages, symbols, scripts and shapes, with the aim to make it difficult to detect due to its custom approach and its ever changing aspects. Utilizing special characters to substitute for alphabets, which makes it readable to humans but nonsensical to machine. The intuition is that a substituted alphabet should resemble the shape of the intended alphabet. In this context, the paper explores the possibility of identifying such mixed code texts with special characters by proposing an approach to normalize them and determine if it contains propaganda elements. As a consequence, a tailored algorithm in combination with a deep learning models for character selection is defined and presented. The results gathered from its experimentation are discussed and the achieved performances are compared with the related works1.","PeriodicalId":121225,"journal":{"name":"Proceedings of the 15th International Conference on Availability, Reliability and Security","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Mixed-code text analysis for the detection of online hidden propaganda\",\"authors\":\"A. Tundis, G. Mukherjee, M. Mühlhäuser\",\"doi\":\"10.1145/3407023.3409211\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Internet-based communication systems have become an increasing tool for spreading misinformation and propaganda. Though mechanisms adept in tracking unwarranted information and messages exist, users have devised different methods to avoid scrutiny and detection. One of such method is the use of mixed-code language. Mixed code is text written in an unconventional form combining different languages, symbols, scripts and shapes, with the aim to make it difficult to detect due to its custom approach and its ever changing aspects. Utilizing special characters to substitute for alphabets, which makes it readable to humans but nonsensical to machine. The intuition is that a substituted alphabet should resemble the shape of the intended alphabet. In this context, the paper explores the possibility of identifying such mixed code texts with special characters by proposing an approach to normalize them and determine if it contains propaganda elements. As a consequence, a tailored algorithm in combination with a deep learning models for character selection is defined and presented. The results gathered from its experimentation are discussed and the achieved performances are compared with the related works1.\",\"PeriodicalId\":121225,\"journal\":{\"name\":\"Proceedings of the 15th International Conference on Availability, Reliability and Security\",\"volume\":\"85 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-08-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 15th International Conference on Availability, Reliability and Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3407023.3409211\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th International Conference on Availability, Reliability and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3407023.3409211","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Mixed-code text analysis for the detection of online hidden propaganda
Internet-based communication systems have become an increasing tool for spreading misinformation and propaganda. Though mechanisms adept in tracking unwarranted information and messages exist, users have devised different methods to avoid scrutiny and detection. One of such method is the use of mixed-code language. Mixed code is text written in an unconventional form combining different languages, symbols, scripts and shapes, with the aim to make it difficult to detect due to its custom approach and its ever changing aspects. Utilizing special characters to substitute for alphabets, which makes it readable to humans but nonsensical to machine. The intuition is that a substituted alphabet should resemble the shape of the intended alphabet. In this context, the paper explores the possibility of identifying such mixed code texts with special characters by proposing an approach to normalize them and determine if it contains propaganda elements. As a consequence, a tailored algorithm in combination with a deep learning models for character selection is defined and presented. The results gathered from its experimentation are discussed and the achieved performances are compared with the related works1.