{"title":"用文体学给程序员和作家定性","authors":"R. Greenstadt","doi":"10.1145/3082031.3092567","DOIUrl":null,"url":null,"abstract":"In this talk, I will discuss my lab's work in the emerging field of adversarial stylometry and machine learning. Machine learning algorithms are increasingly being used in security and privacy domains, in areas that go beyond intrusion or spam detection. For example, in digital forensics, questions often arise about the authors of documents: their identity, demographic background, and whether they can be linked to other documents. The field of stylometry uses linguistic features and machine learning techniques to answer these questions. We have applied stylometry to difficult domains such as underground hacker forums, open source projects (code), and tweets. I will discuss our Doppelgnger Finder algorithm, which enables us to group Sybil accounts on underground forums and detect blogs from Twitter feeds and reddit comments. In addition, I will discuss our work attributing unknown source code and binaries.","PeriodicalId":431672,"journal":{"name":"Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using Stylometry to Attribute Programmers and Writers\",\"authors\":\"R. Greenstadt\",\"doi\":\"10.1145/3082031.3092567\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this talk, I will discuss my lab's work in the emerging field of adversarial stylometry and machine learning. Machine learning algorithms are increasingly being used in security and privacy domains, in areas that go beyond intrusion or spam detection. For example, in digital forensics, questions often arise about the authors of documents: their identity, demographic background, and whether they can be linked to other documents. The field of stylometry uses linguistic features and machine learning techniques to answer these questions. We have applied stylometry to difficult domains such as underground hacker forums, open source projects (code), and tweets. I will discuss our Doppelgnger Finder algorithm, which enables us to group Sybil accounts on underground forums and detect blogs from Twitter feeds and reddit comments. In addition, I will discuss our work attributing unknown source code and binaries.\",\"PeriodicalId\":431672,\"journal\":{\"name\":\"Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3082031.3092567\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3082031.3092567","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using Stylometry to Attribute Programmers and Writers
In this talk, I will discuss my lab's work in the emerging field of adversarial stylometry and machine learning. Machine learning algorithms are increasingly being used in security and privacy domains, in areas that go beyond intrusion or spam detection. For example, in digital forensics, questions often arise about the authors of documents: their identity, demographic background, and whether they can be linked to other documents. The field of stylometry uses linguistic features and machine learning techniques to answer these questions. We have applied stylometry to difficult domains such as underground hacker forums, open source projects (code), and tweets. I will discuss our Doppelgnger Finder algorithm, which enables us to group Sybil accounts on underground forums and detect blogs from Twitter feeds and reddit comments. In addition, I will discuss our work attributing unknown source code and binaries.