{"title":"Hacking stylometry with multiple voices: Imaginary writers can override authorial signal in Delta","authors":"Daniil Skorinkin, Boris Orekhov","doi":"10.1093/llc/fqad012","DOIUrl":null,"url":null,"abstract":"Abstract It is a basic assumption of stylometry that texts written by the same person show greater stylometric similarity even if published under multiple pennames. Statistical authorship attribution strongly relies on the ability of Burrows’s Delta and its variants to cluster one author together regardless of pseudonyms. At the same time, the very first computational discoveries by the founder of modern stylometry showed that a single author is capable of producing multiple voices (Burrows, 1987, Computation into Criticism: A Study of Jane Austen’s Novels and an Experiment in Method. Clarendon Press). We investigate two authors whose stylistically autonomous pennames seem to deceive Delta and override authorial signals: a Portuguese poet Fernando Pessoa and a French novelist Romain Gary. Pessoa managed to create at least three pennames (the author himself used the term ‘heteronym’) who exhibit all traits of individual human beings from the stylometric point of view. Gary’s alter ego Emile Ajar, who was an intentional literary mystification, also demonstrates traits of stylometric autonomy. At the same time, other pseudonyms used by Gary lack that autonomy completely. Our investigation shows that there appears to be a continuum between a purely formal use of a penname, which brings almost no distinction from the real name of an author, and a strong literary sub-personality such as those created by Pessoa.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"31 1","pages":"0"},"PeriodicalIF":0.7000,"publicationDate":"2023-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Scholarship in the Humanities","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/llc/fqad012","RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"HUMANITIES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 1
Abstract
Abstract It is a basic assumption of stylometry that texts written by the same person show greater stylometric similarity even if published under multiple pennames. Statistical authorship attribution strongly relies on the ability of Burrows’s Delta and its variants to cluster one author together regardless of pseudonyms. At the same time, the very first computational discoveries by the founder of modern stylometry showed that a single author is capable of producing multiple voices (Burrows, 1987, Computation into Criticism: A Study of Jane Austen’s Novels and an Experiment in Method. Clarendon Press). We investigate two authors whose stylistically autonomous pennames seem to deceive Delta and override authorial signals: a Portuguese poet Fernando Pessoa and a French novelist Romain Gary. Pessoa managed to create at least three pennames (the author himself used the term ‘heteronym’) who exhibit all traits of individual human beings from the stylometric point of view. Gary’s alter ego Emile Ajar, who was an intentional literary mystification, also demonstrates traits of stylometric autonomy. At the same time, other pseudonyms used by Gary lack that autonomy completely. Our investigation shows that there appears to be a continuum between a purely formal use of a penname, which brings almost no distinction from the real name of an author, and a strong literary sub-personality such as those created by Pessoa.
期刊介绍:
DSH or Digital Scholarship in the Humanities is an international, peer reviewed journal which publishes original contributions on all aspects of digital scholarship in the Humanities including, but not limited to, the field of what is currently called the Digital Humanities. Long and short papers report on theoretical, methodological, experimental, and applied research and include results of research projects, descriptions and evaluations of tools, techniques, and methodologies, and reports on work in progress. DSH also publishes reviews of books and resources. Digital Scholarship in the Humanities was previously known as Literary and Linguistic Computing.