{"title":"A comparison of two text specificity measures analyzing a heterogenous text corpus","authors":"A. Oleinik","doi":"10.53482/2023_54_404","DOIUrl":null,"url":null,"abstract":"The article compares the performance of two term specificity measures, Cohen’s d and Z-score, when analyzing political and media discourses on Russia’s war in Ukraine in four languages and five countries. In addition to linguistic and stylistic heterogeneity, 3,347 texts included in the corpus have variable length. The two measures display convergent validity, as confirmed by various performance metrics. It is argued that the measures can be adapted to a broader range of tasks in information retrieval and digital humanities, in addition to their usefulness for text mining and content analysis.","PeriodicalId":51918,"journal":{"name":"Glottometrics","volume":null,"pages":null},"PeriodicalIF":0.2000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Glottometrics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.53482/2023_54_404","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"LINGUISTICS","Score":null,"Total":0}
引用次数: 1
Abstract
The article compares the performance of two term specificity measures, Cohen’s d and Z-score, when analyzing political and media discourses on Russia’s war in Ukraine in four languages and five countries. In addition to linguistic and stylistic heterogeneity, 3,347 texts included in the corpus have variable length. The two measures display convergent validity, as confirmed by various performance metrics. It is argued that the measures can be adapted to a broader range of tasks in information retrieval and digital humanities, in addition to their usefulness for text mining and content analysis.
期刊介绍:
The aim of Glottometrics is quantification, measurement and mathematical modeling of any kind of language phenomena. We invite contributions on probabilistic or other mathematical models (e.g. graph theoretic or optimization approaches) which enable to establish language laws that can be validated by testing statistical hypotheses.