{"title":"评估文本生产的模式、主题和时间的特有特征的稳定性水平","authors":"T. Litvinova, O. Litvinova, Pavel Seredin","doi":"10.23919/FRUCT.2018.8588092","DOIUrl":null,"url":null,"abstract":"Authorship attribution, i.e. task of revealing the author of a disputed text, is one of challenging issues facing digital forensics. Cross-domain authorship attribution when training and test texts differ in genres, topics and even modes (written/oral) is the most realistic, yet the most difficult scenario. All authorship attribution studies rely on the notion of an idiolect, which is a set of stable features, despite the fact that there are few studies exploring the stability of idiolectal features. The aim of the paper is to reveal the effect of mode, topics and time of text production on the stability of idiolectal features across a series of experiments. Our pilot study revealed that a mode change (written/oral) causes the most striking differences in text parameters in comparison to a topic and time of production although some features (namely, relative frequencies of certain discourse markers) remain relatively stable in all experimental setups. We conclude that the corpus containing diverse types of texts from each individual is needed for thoroughly examining the stability of idiolectal features and developing cross-domain attribution techniques to be employed in realistic scenarios.","PeriodicalId":183812,"journal":{"name":"2018 23rd Conference of Open Innovations Association (FRUCT)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Assessing the Level of Stability of Idiolectal Features across Modes, Topics and Time of Text Production\",\"authors\":\"T. Litvinova, O. Litvinova, Pavel Seredin\",\"doi\":\"10.23919/FRUCT.2018.8588092\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Authorship attribution, i.e. task of revealing the author of a disputed text, is one of challenging issues facing digital forensics. Cross-domain authorship attribution when training and test texts differ in genres, topics and even modes (written/oral) is the most realistic, yet the most difficult scenario. All authorship attribution studies rely on the notion of an idiolect, which is a set of stable features, despite the fact that there are few studies exploring the stability of idiolectal features. The aim of the paper is to reveal the effect of mode, topics and time of text production on the stability of idiolectal features across a series of experiments. Our pilot study revealed that a mode change (written/oral) causes the most striking differences in text parameters in comparison to a topic and time of production although some features (namely, relative frequencies of certain discourse markers) remain relatively stable in all experimental setups. We conclude that the corpus containing diverse types of texts from each individual is needed for thoroughly examining the stability of idiolectal features and developing cross-domain attribution techniques to be employed in realistic scenarios.\",\"PeriodicalId\":183812,\"journal\":{\"name\":\"2018 23rd Conference of Open Innovations Association (FRUCT)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 23rd Conference of Open Innovations Association (FRUCT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/FRUCT.2018.8588092\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 23rd Conference of Open Innovations Association (FRUCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/FRUCT.2018.8588092","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Assessing the Level of Stability of Idiolectal Features across Modes, Topics and Time of Text Production
Authorship attribution, i.e. task of revealing the author of a disputed text, is one of challenging issues facing digital forensics. Cross-domain authorship attribution when training and test texts differ in genres, topics and even modes (written/oral) is the most realistic, yet the most difficult scenario. All authorship attribution studies rely on the notion of an idiolect, which is a set of stable features, despite the fact that there are few studies exploring the stability of idiolectal features. The aim of the paper is to reveal the effect of mode, topics and time of text production on the stability of idiolectal features across a series of experiments. Our pilot study revealed that a mode change (written/oral) causes the most striking differences in text parameters in comparison to a topic and time of production although some features (namely, relative frequencies of certain discourse markers) remain relatively stable in all experimental setups. We conclude that the corpus containing diverse types of texts from each individual is needed for thoroughly examining the stability of idiolectal features and developing cross-domain attribution techniques to be employed in realistic scenarios.