评估文本生产的模式、主题和时间的特有特征的稳定性水平

T. Litvinova, O. Litvinova, Pavel Seredin
{"title":"评估文本生产的模式、主题和时间的特有特征的稳定性水平","authors":"T. Litvinova, O. Litvinova, Pavel Seredin","doi":"10.23919/FRUCT.2018.8588092","DOIUrl":null,"url":null,"abstract":"Authorship attribution, i.e. task of revealing the author of a disputed text, is one of challenging issues facing digital forensics. Cross-domain authorship attribution when training and test texts differ in genres, topics and even modes (written/oral) is the most realistic, yet the most difficult scenario. All authorship attribution studies rely on the notion of an idiolect, which is a set of stable features, despite the fact that there are few studies exploring the stability of idiolectal features. The aim of the paper is to reveal the effect of mode, topics and time of text production on the stability of idiolectal features across a series of experiments. Our pilot study revealed that a mode change (written/oral) causes the most striking differences in text parameters in comparison to a topic and time of production although some features (namely, relative frequencies of certain discourse markers) remain relatively stable in all experimental setups. We conclude that the corpus containing diverse types of texts from each individual is needed for thoroughly examining the stability of idiolectal features and developing cross-domain attribution techniques to be employed in realistic scenarios.","PeriodicalId":183812,"journal":{"name":"2018 23rd Conference of Open Innovations Association (FRUCT)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Assessing the Level of Stability of Idiolectal Features across Modes, Topics and Time of Text Production\",\"authors\":\"T. Litvinova, O. Litvinova, Pavel Seredin\",\"doi\":\"10.23919/FRUCT.2018.8588092\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Authorship attribution, i.e. task of revealing the author of a disputed text, is one of challenging issues facing digital forensics. Cross-domain authorship attribution when training and test texts differ in genres, topics and even modes (written/oral) is the most realistic, yet the most difficult scenario. All authorship attribution studies rely on the notion of an idiolect, which is a set of stable features, despite the fact that there are few studies exploring the stability of idiolectal features. The aim of the paper is to reveal the effect of mode, topics and time of text production on the stability of idiolectal features across a series of experiments. Our pilot study revealed that a mode change (written/oral) causes the most striking differences in text parameters in comparison to a topic and time of production although some features (namely, relative frequencies of certain discourse markers) remain relatively stable in all experimental setups. We conclude that the corpus containing diverse types of texts from each individual is needed for thoroughly examining the stability of idiolectal features and developing cross-domain attribution techniques to be employed in realistic scenarios.\",\"PeriodicalId\":183812,\"journal\":{\"name\":\"2018 23rd Conference of Open Innovations Association (FRUCT)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 23rd Conference of Open Innovations Association (FRUCT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/FRUCT.2018.8588092\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 23rd Conference of Open Innovations Association (FRUCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/FRUCT.2018.8588092","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

作者归属,即揭示有争议文本的作者的任务,是数字取证面临的具有挑战性的问题之一。当训练和测试文本在体裁、主题甚至模式(书面/口头)上不同时,跨领域作者归属是最现实的,但也是最困难的情况。所有的作者归因研究都依赖于习语的概念,而习语是一组稳定的特征,尽管很少有研究探讨习语特征的稳定性。本文旨在通过一系列实验揭示文本产生的方式、主题和时间对习语特征稳定性的影响。我们的初步研究表明,与主题和生产时间相比,模式变化(书面/口头)会导致文本参数的最显著差异,尽管某些特征(即某些话语标记的相对频率)在所有实验设置中保持相对稳定。我们的结论是,语料库需要包含来自每个个体的不同类型的文本,以彻底检查个体特征的稳定性,并开发跨领域归因技术,以应用于现实场景。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Assessing the Level of Stability of Idiolectal Features across Modes, Topics and Time of Text Production
Authorship attribution, i.e. task of revealing the author of a disputed text, is one of challenging issues facing digital forensics. Cross-domain authorship attribution when training and test texts differ in genres, topics and even modes (written/oral) is the most realistic, yet the most difficult scenario. All authorship attribution studies rely on the notion of an idiolect, which is a set of stable features, despite the fact that there are few studies exploring the stability of idiolectal features. The aim of the paper is to reveal the effect of mode, topics and time of text production on the stability of idiolectal features across a series of experiments. Our pilot study revealed that a mode change (written/oral) causes the most striking differences in text parameters in comparison to a topic and time of production although some features (namely, relative frequencies of certain discourse markers) remain relatively stable in all experimental setups. We conclude that the corpus containing diverse types of texts from each individual is needed for thoroughly examining the stability of idiolectal features and developing cross-domain attribution techniques to be employed in realistic scenarios.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信