Alina G. Monogarova, Tatyana A. Shiryaeva, Elena V. Tikhonova
{"title":"The words that make fake stories go viral: A corpus-based approach to analyzing Russian Covid-19 disinformation","authors":"Alina G. Monogarova, Tatyana A. Shiryaeva, Elena V. Tikhonova","doi":"10.22363/2687-0088-33757","DOIUrl":null,"url":null,"abstract":"Since the outbreak of the Covid-19 pandemic in 2020, the spread of the new virus has been accompanied by the growing infodemic that became a dangerous prospect for Internet users. Social media and online messengers have been instrumental in making fake stories about Covid-19 viral. The lack of an efficient instrument for classifying digital texts as true or fake is still a big challenge. Deceptive content and its specific characteristics attract attention of many linguists, making it one of the most popular contemporary topics in corpus-based research. This paper explores the language of viral Covid-related fake stories and identifies specific linguistic features that distinguish fake stories from real (authentic) news using quantitative and qualitative approaches to text analysis. The study was conducted on the material of the self-compiled diachronic corpus containing Russian misleading coronavirus-related social media posts (a target corpus of 897 texts) which were virally shared by Russian users through social media platforms and mobile messengers from March 2020 to March 2022 and the reference corpus containing genuine materials about the virus. First, we compared two corpora using an interpretable set of features across language levels to find whether there is evidence of significant variation in the language of fake and real news. Then, we focused on frequency profiling to extract other over-represented groups of words from both corpora. Finally, we analyzed the corresponding contexts to indicate whether these features can be considered as linguistic trends in Russian Covid-related fake story making. Findings regarding the role of these over-represented groups of words in fake narratives about coronavirus revealed efficiency of frequency profiling in indicating lexical patterns of the language of deception.","PeriodicalId":53426,"journal":{"name":"Russian Journal of Linguistics","volume":"41 1","pages":"0"},"PeriodicalIF":1.5000,"publicationDate":"2023-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Russian Journal of Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22363/2687-0088-33757","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 0
Abstract
Since the outbreak of the Covid-19 pandemic in 2020, the spread of the new virus has been accompanied by the growing infodemic that became a dangerous prospect for Internet users. Social media and online messengers have been instrumental in making fake stories about Covid-19 viral. The lack of an efficient instrument for classifying digital texts as true or fake is still a big challenge. Deceptive content and its specific characteristics attract attention of many linguists, making it one of the most popular contemporary topics in corpus-based research. This paper explores the language of viral Covid-related fake stories and identifies specific linguistic features that distinguish fake stories from real (authentic) news using quantitative and qualitative approaches to text analysis. The study was conducted on the material of the self-compiled diachronic corpus containing Russian misleading coronavirus-related social media posts (a target corpus of 897 texts) which were virally shared by Russian users through social media platforms and mobile messengers from March 2020 to March 2022 and the reference corpus containing genuine materials about the virus. First, we compared two corpora using an interpretable set of features across language levels to find whether there is evidence of significant variation in the language of fake and real news. Then, we focused on frequency profiling to extract other over-represented groups of words from both corpora. Finally, we analyzed the corresponding contexts to indicate whether these features can be considered as linguistic trends in Russian Covid-related fake story making. Findings regarding the role of these over-represented groups of words in fake narratives about coronavirus revealed efficiency of frequency profiling in indicating lexical patterns of the language of deception.