{"title":"文体匿名:模仿是最好的策略吗?","authors":"Mahmoud Khonji, Y. Iraqi","doi":"10.1109/Trustcom.2015.472","DOIUrl":null,"url":null,"abstract":"Stylometry analysis of given electronic texts can allow for the extraction of information about their authors by analyzing the stylistic choices the authors make to write their texts. Such extracted information could be the identity of suspect authors or their profile attributes such as their gender, age group, ethnicity group, etc. Therefore, when preserving the anonymity of an author is critical, such as that of a whistle blower, it is important to ensure the stylistic anonymity of the conveyed text itself in addition to anonymizing communication channels (e.g. Tor, or the minimization of application fingerprints). Currently, only two stylistic anonymization strategies are known, namely: imitation and obfuscation attacks. A long-term objective is to find automated methods that reliably transform given input texts such that the output texts maximize author anonymity while, reasonably, preserving the semantics of the input texts. Before one proceeds with such long-term objective, it is important to first identify effective strategies that maximize stylistic anonymity. The current state of the literature implies that imitation attacks are better at preserving the anonymity of authors than obfuscation. However, we argue that such evaluations are limited and should not generalize to stylistic anonymity as they were only executed against AA solvers, a closed-set problem. In this study, we extend such evaluations against state-of-the-art AV solvers, an open-set problem. Our results show that imitation attacks degrade the classification accuracy of AV solvers more aggressively than that of AA solvers. We argue that such reduction in accuracy below random chance guessing renders imitation attacks as inferior strategies relative to obfuscation attacks. Furthermore, as we present a general formal notation of stylometry problems, we conjecture that the same observations apply to all stylometry problems (AA, AV, AP, SI).","PeriodicalId":277092,"journal":{"name":"2015 IEEE Trustcom/BigDataSE/ISPA","volume":"156 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Stylometric Anonymity: Is Imitation the Best Strategy?\",\"authors\":\"Mahmoud Khonji, Y. Iraqi\",\"doi\":\"10.1109/Trustcom.2015.472\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Stylometry analysis of given electronic texts can allow for the extraction of information about their authors by analyzing the stylistic choices the authors make to write their texts. Such extracted information could be the identity of suspect authors or their profile attributes such as their gender, age group, ethnicity group, etc. Therefore, when preserving the anonymity of an author is critical, such as that of a whistle blower, it is important to ensure the stylistic anonymity of the conveyed text itself in addition to anonymizing communication channels (e.g. Tor, or the minimization of application fingerprints). Currently, only two stylistic anonymization strategies are known, namely: imitation and obfuscation attacks. A long-term objective is to find automated methods that reliably transform given input texts such that the output texts maximize author anonymity while, reasonably, preserving the semantics of the input texts. Before one proceeds with such long-term objective, it is important to first identify effective strategies that maximize stylistic anonymity. The current state of the literature implies that imitation attacks are better at preserving the anonymity of authors than obfuscation. However, we argue that such evaluations are limited and should not generalize to stylistic anonymity as they were only executed against AA solvers, a closed-set problem. In this study, we extend such evaluations against state-of-the-art AV solvers, an open-set problem. Our results show that imitation attacks degrade the classification accuracy of AV solvers more aggressively than that of AA solvers. We argue that such reduction in accuracy below random chance guessing renders imitation attacks as inferior strategies relative to obfuscation attacks. Furthermore, as we present a general formal notation of stylometry problems, we conjecture that the same observations apply to all stylometry problems (AA, AV, AP, SI).\",\"PeriodicalId\":277092,\"journal\":{\"name\":\"2015 IEEE Trustcom/BigDataSE/ISPA\",\"volume\":\"156 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE Trustcom/BigDataSE/ISPA\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/Trustcom.2015.472\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE Trustcom/BigDataSE/ISPA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Trustcom.2015.472","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Stylometric Anonymity: Is Imitation the Best Strategy?
Stylometry analysis of given electronic texts can allow for the extraction of information about their authors by analyzing the stylistic choices the authors make to write their texts. Such extracted information could be the identity of suspect authors or their profile attributes such as their gender, age group, ethnicity group, etc. Therefore, when preserving the anonymity of an author is critical, such as that of a whistle blower, it is important to ensure the stylistic anonymity of the conveyed text itself in addition to anonymizing communication channels (e.g. Tor, or the minimization of application fingerprints). Currently, only two stylistic anonymization strategies are known, namely: imitation and obfuscation attacks. A long-term objective is to find automated methods that reliably transform given input texts such that the output texts maximize author anonymity while, reasonably, preserving the semantics of the input texts. Before one proceeds with such long-term objective, it is important to first identify effective strategies that maximize stylistic anonymity. The current state of the literature implies that imitation attacks are better at preserving the anonymity of authors than obfuscation. However, we argue that such evaluations are limited and should not generalize to stylistic anonymity as they were only executed against AA solvers, a closed-set problem. In this study, we extend such evaluations against state-of-the-art AV solvers, an open-set problem. Our results show that imitation attacks degrade the classification accuracy of AV solvers more aggressively than that of AA solvers. We argue that such reduction in accuracy below random chance guessing renders imitation attacks as inferior strategies relative to obfuscation attacks. Furthermore, as we present a general formal notation of stylometry problems, we conjecture that the same observations apply to all stylometry problems (AA, AV, AP, SI).