Barbara - Lewandowska-Tomaszczyk, A. Bączkowska, Chaya Liebeskind, Giedre Valunaite Oleskeviciene, Slavko Žitnik
{"title":"一个综合的显性和隐性攻击性语言分类","authors":"Barbara - Lewandowska-Tomaszczyk, A. Bączkowska, Chaya Liebeskind, Giedre Valunaite Oleskeviciene, Slavko Žitnik","doi":"10.1515/lpp-2023-0002","DOIUrl":null,"url":null,"abstract":"Abstract The current study represents an integrated model of explicit and implicit offensive language taxonomy. First, it focuses on a definitional revision and enrichment of the explicit offensive language taxonomy by reviewing the collection of available corpora and comparing tagging schemas applied there. The study relies mainly on the categories originally proposed by Zampieri et al. (2019) in terms of offensive language categorization schemata. After the explanation of semantic differences between particular concepts used in the tagging systems and the analysis of theoretical frameworks, a finite set of classes is presented, which cover aspects of offensive language representation along with linguistically sound explanations (Lewandowska-Tomaszczyk et al. 2021). In the analytic procedure, offensive from non-offensive discourse is first distinguished, with the question of offence Target and the following categorization levels and sublevels. Based on the relevant data generated from Sketch Engine (https://www.sketchengine.eu/ententen-english-corpus/), we propose the concept of offensive language as a superordinate category in our system with a number of hierarchically arranged 17 subcategories. The categories are taxonomically structured into 4 levels and verified with the use of neural-based (lexical) embeddings. Together with a taxonomy of implicit offensive language and its subcategorization levels which has received little scholarly attention until now, the categorization is exemplified in samples of offensive discourses in selected English social media materials, i.e., publicly available 25 web-based hate speech datasets (consult Appendix 1 for a complete list). The offensive category levels (types of offence, targets, etc.) and aspects (offensive language property clusters) as well as the categories of explicitness and implicitness are discussed in the study and the computationally verified integrated explicit and implicit offensive language taxonomy proposed in the study.","PeriodicalId":39423,"journal":{"name":"Lodz Papers in Pragmatics","volume":"19 1","pages":"7 - 48"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An integrated explicit and implicit offensive language taxonomy\",\"authors\":\"Barbara - Lewandowska-Tomaszczyk, A. Bączkowska, Chaya Liebeskind, Giedre Valunaite Oleskeviciene, Slavko Žitnik\",\"doi\":\"10.1515/lpp-2023-0002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract The current study represents an integrated model of explicit and implicit offensive language taxonomy. First, it focuses on a definitional revision and enrichment of the explicit offensive language taxonomy by reviewing the collection of available corpora and comparing tagging schemas applied there. The study relies mainly on the categories originally proposed by Zampieri et al. (2019) in terms of offensive language categorization schemata. After the explanation of semantic differences between particular concepts used in the tagging systems and the analysis of theoretical frameworks, a finite set of classes is presented, which cover aspects of offensive language representation along with linguistically sound explanations (Lewandowska-Tomaszczyk et al. 2021). In the analytic procedure, offensive from non-offensive discourse is first distinguished, with the question of offence Target and the following categorization levels and sublevels. Based on the relevant data generated from Sketch Engine (https://www.sketchengine.eu/ententen-english-corpus/), we propose the concept of offensive language as a superordinate category in our system with a number of hierarchically arranged 17 subcategories. The categories are taxonomically structured into 4 levels and verified with the use of neural-based (lexical) embeddings. Together with a taxonomy of implicit offensive language and its subcategorization levels which has received little scholarly attention until now, the categorization is exemplified in samples of offensive discourses in selected English social media materials, i.e., publicly available 25 web-based hate speech datasets (consult Appendix 1 for a complete list). The offensive category levels (types of offence, targets, etc.) and aspects (offensive language property clusters) as well as the categories of explicitness and implicitness are discussed in the study and the computationally verified integrated explicit and implicit offensive language taxonomy proposed in the study.\",\"PeriodicalId\":39423,\"journal\":{\"name\":\"Lodz Papers in Pragmatics\",\"volume\":\"19 1\",\"pages\":\"7 - 48\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Lodz Papers in Pragmatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1515/lpp-2023-0002\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Arts and Humanities\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lodz Papers in Pragmatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/lpp-2023-0002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Arts and Humanities","Score":null,"Total":0}
引用次数: 0
摘要
摘要本研究提出了显性和隐性攻击性语言分类的综合模型。首先,通过回顾现有语料库的收集和比较在那里应用的标记模式,重点关注明确的攻击性语言分类的定义修订和丰富。本研究主要依赖于Zampieri et al.(2019)在攻击性语言分类图式方面最初提出的类别。在解释了标注系统中使用的特定概念之间的语义差异并分析了理论框架之后,提出了一组有限的类,涵盖了攻击性语言表示的各个方面以及语言学上的合理解释(Lewandowska-Tomaszczyk et al. 2021)。在分析过程中,首先将冒犯性话语与非冒犯性话语区分开来,并提出冒犯目标问题,以及随后的分类层次和子层次。基于Sketch Engine (https://www.sketchengine.eu/ententen-english-corpus/)生成的相关数据,我们提出了攻击性语言的概念,并将其作为我们系统中的一个上级类别,该类别由17个子类别分层排列。这些类别在分类上分为4个级别,并使用基于神经的(词汇)嵌入进行验证。与隐性冒犯性语言的分类及其子分类水平(迄今为止很少受到学术关注)一起,该分类在选定的英语社交媒体材料中的冒犯性话语样本中得到例证,即公开提供的25个基于网络的仇恨言论数据集(请参阅附录1以获取完整列表)。研究讨论了攻击性语言的范畴层次(攻击类型、攻击对象等)和方面(攻击性语言属性集群)以及显性和隐性的范畴,并提出了计算验证的显性和隐性综合攻击性语言分类法。
An integrated explicit and implicit offensive language taxonomy
Abstract The current study represents an integrated model of explicit and implicit offensive language taxonomy. First, it focuses on a definitional revision and enrichment of the explicit offensive language taxonomy by reviewing the collection of available corpora and comparing tagging schemas applied there. The study relies mainly on the categories originally proposed by Zampieri et al. (2019) in terms of offensive language categorization schemata. After the explanation of semantic differences between particular concepts used in the tagging systems and the analysis of theoretical frameworks, a finite set of classes is presented, which cover aspects of offensive language representation along with linguistically sound explanations (Lewandowska-Tomaszczyk et al. 2021). In the analytic procedure, offensive from non-offensive discourse is first distinguished, with the question of offence Target and the following categorization levels and sublevels. Based on the relevant data generated from Sketch Engine (https://www.sketchengine.eu/ententen-english-corpus/), we propose the concept of offensive language as a superordinate category in our system with a number of hierarchically arranged 17 subcategories. The categories are taxonomically structured into 4 levels and verified with the use of neural-based (lexical) embeddings. Together with a taxonomy of implicit offensive language and its subcategorization levels which has received little scholarly attention until now, the categorization is exemplified in samples of offensive discourses in selected English social media materials, i.e., publicly available 25 web-based hate speech datasets (consult Appendix 1 for a complete list). The offensive category levels (types of offence, targets, etc.) and aspects (offensive language property clusters) as well as the categories of explicitness and implicitness are discussed in the study and the computationally verified integrated explicit and implicit offensive language taxonomy proposed in the study.