Multi-Platform Authorship Verification

Proceedings of the Third Central European Cybersecurity Conference Pub Date : 2019-11-14 DOI:10.1145/3360664.3360677

Abdulaziz Altamimi, N. Clarke, S. Furnell, Fudong Li

{"title":"Multi-Platform Authorship Verification","authors":"Abdulaziz Altamimi, N. Clarke, S. Furnell, Fudong Li","doi":"10.1145/3360664.3360677","DOIUrl":null,"url":null,"abstract":"At the present time, there has been a rapid increase in the variety and popularity of messaging systems such as social network messaging, text messages, email and Twitter, with users frequently exchanging messages across various platforms. Unfortunately, in amongst the legitimate messages, there is a host of illegitimate and inappropriate content - with cyber stalking, trolling and computerassisted crime all taking place. Therefore, there is a need to identify individuals using messaging systems. Stylometry is the study of linguistic features in a text which consists of verifying an author based on his writing style that consists of checking whether a target text was written or not by a specific individual author. Whilst much research has taken place within authorship verification, studies have focused upon singular platforms, often had limited datasets and restricted methodologies that have meant it is difficult to appreciate the real-world value of the approach. This paper seeks to overcome these limitations through providing an analysis of authorship verification across four common messaging systems. This approach enables a direct comparison of recognition performance and provides a basis for analyzing the feature vectors across platforms to better understand what aspects each capitalize upon in order to achieve good classification. The experiments also include an investigation into the feature vector creation, utilizing population and user-based techniques to compare and contrast performance. The experiment involved 50 participants across four common platforms with a total 13,617; 106,359; 4,539; and 6,540 samples for Twitter, SMS, Facebook, and Email achieving an Equal Error Rate (EER) of 20.16%, 7.97%, 25% and 13.11% respectively.","PeriodicalId":409365,"journal":{"name":"Proceedings of the Third Central European Cybersecurity Conference","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Third Central European Cybersecurity Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3360664.3360677","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

At the present time, there has been a rapid increase in the variety and popularity of messaging systems such as social network messaging, text messages, email and Twitter, with users frequently exchanging messages across various platforms. Unfortunately, in amongst the legitimate messages, there is a host of illegitimate and inappropriate content - with cyber stalking, trolling and computerassisted crime all taking place. Therefore, there is a need to identify individuals using messaging systems. Stylometry is the study of linguistic features in a text which consists of verifying an author based on his writing style that consists of checking whether a target text was written or not by a specific individual author. Whilst much research has taken place within authorship verification, studies have focused upon singular platforms, often had limited datasets and restricted methodologies that have meant it is difficult to appreciate the real-world value of the approach. This paper seeks to overcome these limitations through providing an analysis of authorship verification across four common messaging systems. This approach enables a direct comparison of recognition performance and provides a basis for analyzing the feature vectors across platforms to better understand what aspects each capitalize upon in order to achieve good classification. The experiments also include an investigation into the feature vector creation, utilizing population and user-based techniques to compare and contrast performance. The experiment involved 50 participants across four common platforms with a total 13,617; 106,359; 4,539; and 6,540 samples for Twitter, SMS, Facebook, and Email achieving an Equal Error Rate (EER) of 20.16%, 7.97%, 25% and 13.11% respectively.

查看原文本刊更多论文

多平台作者验证

目前，社交网络消息、短信、电子邮件和Twitter等消息传递系统的种类和普及程度迅速增加，用户频繁地在各种平台上交换消息。不幸的是，在合法的信息中，有大量的非法和不适当的内容——网络跟踪、钓鱼和计算机辅助犯罪都在发生。因此，有必要识别使用消息传递系统的个人。文体学是对文本语言特征的研究，它包括根据作者的写作风格来验证作者，包括检查目标文本是否由特定的个人作者所写。虽然在作者身份验证方面进行了大量研究，但研究主要集中在单一平台上，通常具有有限的数据集和有限的方法，这意味着很难欣赏该方法的现实价值。本文试图通过提供跨四种常见消息传递系统的作者身份验证分析来克服这些限制。这种方法可以直接比较识别性能，并为分析跨平台的特征向量提供基础，以便更好地了解每个方面都利用哪些方面来实现良好的分类。实验还包括对特征向量创建的研究，利用人口和基于用户的技术来比较和对比性能。该实验涉及四个公共平台的50名参与者，共有13,617人;106359;4539;Twitter、SMS、Facebook和Email的6540个样本的平均错误率(EER)分别为20.16%、7.97%、25%和13.11%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Third Central European Cybersecurity Conference

自引率

0.00%

发文量