{"title":"使用Z-Score进行文本分类的模式","authors":"V. A. Yatsko","doi":"10.3103/S0005105522050041","DOIUrl":null,"url":null,"abstract":"<p>This paper describes procedures of the use of the <i>Z</i>-score for text document classification purposes. The author tested the efficiency of this approach to the solution of authorship attribution and genre classification tasks, based on the analysis of distribution of stop words. The paper finds that the calculation of this score based on the raw counts of stop words produces a negative result, while its calculation based on the deviations of frequencies of stop words from the Zipfian score allows a higher classification efficiency. Matching against the previously developed <i>Y</i>-method demonstrated a higher <i>Z</i>-score efficiency for the solution of text classification purposes.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":null,"pages":null},"PeriodicalIF":0.5000,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Patterns of Using the Z-Score for Text Classification Purposes\",\"authors\":\"V. A. Yatsko\",\"doi\":\"10.3103/S0005105522050041\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>This paper describes procedures of the use of the <i>Z</i>-score for text document classification purposes. The author tested the efficiency of this approach to the solution of authorship attribution and genre classification tasks, based on the analysis of distribution of stop words. The paper finds that the calculation of this score based on the raw counts of stop words produces a negative result, while its calculation based on the deviations of frequencies of stop words from the Zipfian score allows a higher classification efficiency. Matching against the previously developed <i>Y</i>-method demonstrated a higher <i>Z</i>-score efficiency for the solution of text classification purposes.</p>\",\"PeriodicalId\":42995,\"journal\":{\"name\":\"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2022-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.3103/S0005105522050041\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.3103/S0005105522050041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Patterns of Using the Z-Score for Text Classification Purposes
This paper describes procedures of the use of the Z-score for text document classification purposes. The author tested the efficiency of this approach to the solution of authorship attribution and genre classification tasks, based on the analysis of distribution of stop words. The paper finds that the calculation of this score based on the raw counts of stop words produces a negative result, while its calculation based on the deviations of frequencies of stop words from the Zipfian score allows a higher classification efficiency. Matching against the previously developed Y-method demonstrated a higher Z-score efficiency for the solution of text classification purposes.
期刊介绍:
Automatic Documentation and Mathematical Linguistics is an international peer reviewed journal that covers all aspects of automation of information processes and systems, as well as algorithms and methods for automatic language analysis. Emphasis is on the practical applications of new technologies and techniques for information analysis and processing.