{"title":"时间漂移对源代码作者归属的影响:源代码中的时间漂移-风格计时学","authors":"Juraj Petrík, D. Chudá","doi":"10.1145/3472410.3472445","DOIUrl":null,"url":null,"abstract":"Stylochronometry deals with the influence of time in an author's style, specifically how it changes stylometric features. Analysis of time drift occurrence is important especially for a dataset creation process of other works in this area. In this paper, we performed experiments using the Google Code Jam dataset to show the influence of time drift in the area of source code authorship attribution. Our experiments revealed that there is significant time drift in stylometric features in one year difference, which is enlargening as the difference of time increases. Another interesting result is that when training our authorship attribution method on data from the future and testing on data from the past, the time drift is lower than in opposite direction. Also, we found the relation between the length of source code and the accuracy of our authorship attribution method.","PeriodicalId":115575,"journal":{"name":"Proceedings of the 22nd International Conference on Computer Systems and Technologies","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The effect of time drift in source code authorship attribution: Time drifting in source code - stylochronometry\",\"authors\":\"Juraj Petrík, D. Chudá\",\"doi\":\"10.1145/3472410.3472445\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Stylochronometry deals with the influence of time in an author's style, specifically how it changes stylometric features. Analysis of time drift occurrence is important especially for a dataset creation process of other works in this area. In this paper, we performed experiments using the Google Code Jam dataset to show the influence of time drift in the area of source code authorship attribution. Our experiments revealed that there is significant time drift in stylometric features in one year difference, which is enlargening as the difference of time increases. Another interesting result is that when training our authorship attribution method on data from the future and testing on data from the past, the time drift is lower than in opposite direction. Also, we found the relation between the length of source code and the accuracy of our authorship attribution method.\",\"PeriodicalId\":115575,\"journal\":{\"name\":\"Proceedings of the 22nd International Conference on Computer Systems and Technologies\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 22nd International Conference on Computer Systems and Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3472410.3472445\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd International Conference on Computer Systems and Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3472410.3472445","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The effect of time drift in source code authorship attribution: Time drifting in source code - stylochronometry
Stylochronometry deals with the influence of time in an author's style, specifically how it changes stylometric features. Analysis of time drift occurrence is important especially for a dataset creation process of other works in this area. In this paper, we performed experiments using the Google Code Jam dataset to show the influence of time drift in the area of source code authorship attribution. Our experiments revealed that there is significant time drift in stylometric features in one year difference, which is enlargening as the difference of time increases. Another interesting result is that when training our authorship attribution method on data from the future and testing on data from the past, the time drift is lower than in opposite direction. Also, we found the relation between the length of source code and the accuracy of our authorship attribution method.