F. M. R. Pardo, Paolo Rosso, A. Charfi, W. Zaghouani
{"title":"为网络安全检测阿拉伯语欺骗性推文","authors":"F. M. R. Pardo, Paolo Rosso, A. Charfi, W. Zaghouani","doi":"10.1109/ISI.2019.8823378","DOIUrl":null,"url":null,"abstract":"In the framework of the QNRF project on Arabic Author Profiling for Cyber-Security, we addressed deception detection in Arabic in order to discard those messages that do not really represent potential threats. We have applied the Low Dimensionality Statistical Embedding (LDSE) method to several corpora for Arabic including the Arabic credibility corpus and two new corpora that we created: the Qatar Twitter corpus and the Qatar News corpus. We achieved a performance of 0.797 Macro F-measure on the Arabic Credibility corpus. The obtained results with two well-known distributed representations, namely Continuous Bag of Words and Skip Grams, showed the competitiveness of our approach. The LDSE approach gave similar results on the two corpora that we created. We evaluated our work in a cross-genre scenario, showing the robustness of LDSE when there are enough data about similar topics.","PeriodicalId":156130,"journal":{"name":"2019 IEEE International Conference on Intelligence and Security Informatics (ISI)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Detecting Deceptive Tweets in Arabic for Cyber-Security\",\"authors\":\"F. M. R. Pardo, Paolo Rosso, A. Charfi, W. Zaghouani\",\"doi\":\"10.1109/ISI.2019.8823378\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the framework of the QNRF project on Arabic Author Profiling for Cyber-Security, we addressed deception detection in Arabic in order to discard those messages that do not really represent potential threats. We have applied the Low Dimensionality Statistical Embedding (LDSE) method to several corpora for Arabic including the Arabic credibility corpus and two new corpora that we created: the Qatar Twitter corpus and the Qatar News corpus. We achieved a performance of 0.797 Macro F-measure on the Arabic Credibility corpus. The obtained results with two well-known distributed representations, namely Continuous Bag of Words and Skip Grams, showed the competitiveness of our approach. The LDSE approach gave similar results on the two corpora that we created. We evaluated our work in a cross-genre scenario, showing the robustness of LDSE when there are enough data about similar topics.\",\"PeriodicalId\":156130,\"journal\":{\"name\":\"2019 IEEE International Conference on Intelligence and Security Informatics (ISI)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Conference on Intelligence and Security Informatics (ISI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISI.2019.8823378\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Intelligence and Security Informatics (ISI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISI.2019.8823378","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Detecting Deceptive Tweets in Arabic for Cyber-Security
In the framework of the QNRF project on Arabic Author Profiling for Cyber-Security, we addressed deception detection in Arabic in order to discard those messages that do not really represent potential threats. We have applied the Low Dimensionality Statistical Embedding (LDSE) method to several corpora for Arabic including the Arabic credibility corpus and two new corpora that we created: the Qatar Twitter corpus and the Qatar News corpus. We achieved a performance of 0.797 Macro F-measure on the Arabic Credibility corpus. The obtained results with two well-known distributed representations, namely Continuous Bag of Words and Skip Grams, showed the competitiveness of our approach. The LDSE approach gave similar results on the two corpora that we created. We evaluated our work in a cross-genre scenario, showing the robustness of LDSE when there are enough data about similar topics.