{"title":"通过高阶文本网络建模和分析量化和识别作者的写作风格","authors":"Hongzhong Deng, Chengxing Wu, Bingfeng Ge, Hongqian Wu","doi":"10.1016/j.joi.2024.101603","DOIUrl":null,"url":null,"abstract":"<div><div>Determining the true author of anonymized texts has important applications ranging from text classification and information extraction to forensic investigations. Despite substantial progress, current authorship identification solutions are limited to extracting straightforward semantic relationships in writing styles, lacking consideration for higher-order features among multiple vocabulary, phrases, or sentences in language structure. Here, we propose a novel approach based on hypernetwork theory to encode higher-order text features into a unified text hyper-network and investigate whether the hyper-order topological features of the text hyper-network contribute to revealing the author's stylistic preferences. Our results indicate that metrics of the text hyper-network, such as hyperdegree, average shortest path length, and intermittency, can capture more information about the author's writing styles. More importantly, in the author identification task of 170 novels, our method accurately distinguished the authorship of 81% of the novels, surpassing the accuracy of the method of using paired word relationships. This further highlights the importance of higher-order features in text analysis, beyond mere pairwise interactions of words.</div></div>","PeriodicalId":48662,"journal":{"name":"Journal of Informetrics","volume":"19 1","pages":"Article 101603"},"PeriodicalIF":3.4000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Quantification and identification of authorial writing style through higher-order text network modeling and analysis\",\"authors\":\"Hongzhong Deng, Chengxing Wu, Bingfeng Ge, Hongqian Wu\",\"doi\":\"10.1016/j.joi.2024.101603\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Determining the true author of anonymized texts has important applications ranging from text classification and information extraction to forensic investigations. Despite substantial progress, current authorship identification solutions are limited to extracting straightforward semantic relationships in writing styles, lacking consideration for higher-order features among multiple vocabulary, phrases, or sentences in language structure. Here, we propose a novel approach based on hypernetwork theory to encode higher-order text features into a unified text hyper-network and investigate whether the hyper-order topological features of the text hyper-network contribute to revealing the author's stylistic preferences. Our results indicate that metrics of the text hyper-network, such as hyperdegree, average shortest path length, and intermittency, can capture more information about the author's writing styles. More importantly, in the author identification task of 170 novels, our method accurately distinguished the authorship of 81% of the novels, surpassing the accuracy of the method of using paired word relationships. This further highlights the importance of higher-order features in text analysis, beyond mere pairwise interactions of words.</div></div>\",\"PeriodicalId\":48662,\"journal\":{\"name\":\"Journal of Informetrics\",\"volume\":\"19 1\",\"pages\":\"Article 101603\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-11-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Informetrics\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1751157724001159\",\"RegionNum\":2,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Informetrics","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1751157724001159","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Quantification and identification of authorial writing style through higher-order text network modeling and analysis
Determining the true author of anonymized texts has important applications ranging from text classification and information extraction to forensic investigations. Despite substantial progress, current authorship identification solutions are limited to extracting straightforward semantic relationships in writing styles, lacking consideration for higher-order features among multiple vocabulary, phrases, or sentences in language structure. Here, we propose a novel approach based on hypernetwork theory to encode higher-order text features into a unified text hyper-network and investigate whether the hyper-order topological features of the text hyper-network contribute to revealing the author's stylistic preferences. Our results indicate that metrics of the text hyper-network, such as hyperdegree, average shortest path length, and intermittency, can capture more information about the author's writing styles. More importantly, in the author identification task of 170 novels, our method accurately distinguished the authorship of 81% of the novels, surpassing the accuracy of the method of using paired word relationships. This further highlights the importance of higher-order features in text analysis, beyond mere pairwise interactions of words.
期刊介绍:
Journal of Informetrics (JOI) publishes rigorous high-quality research on quantitative aspects of information science. The main focus of the journal is on topics in bibliometrics, scientometrics, webometrics, patentometrics, altmetrics and research evaluation. Contributions studying informetric problems using methods from other quantitative fields, such as mathematics, statistics, computer science, economics and econometrics, and network science, are especially encouraged. JOI publishes both theoretical and empirical work. In general, case studies, for instance a bibliometric analysis focusing on a specific research field or a specific country, are not considered suitable for publication in JOI, unless they contain innovative methodological elements.