注释中的生态谬误:人类标签变化的建模超越了社会人口统计学

Annual Meeting of the Association for Computational Linguistics Pub Date : 2023-06-20 DOI:10.48550/arXiv.2306.11559

Matthias Orlikowski, Paul Röttger, P. Cimiano, Dirk Hovy Bielefeld University, U. Oxford, Computing Sciences Department, Bocconi University, Milan, Italy

{"title":"注释中的生态谬误:人类标签变化的建模超越了社会人口统计学","authors":"Matthias Orlikowski, Paul Röttger, P. Cimiano, Dirk Hovy Bielefeld University, U. Oxford, Computing Sciences Department, Bocconi University, Milan, Italy","doi":"10.48550/arXiv.2306.11559","DOIUrl":null,"url":null,"abstract":"Many NLP tasks exhibit human label variation, where different annotators give different labels to the same texts. This variation is known to depend, at least in part, on the sociodemographics of annotators. Recent research aims to model individual annotator behaviour rather than predicting aggregated labels, and we would expect that sociodemographic information is useful for these models. On the other hand, the ecological fallacy states that aggregate group behaviour, such as the behaviour of the average female annotator, does not necessarily explain individual behaviour. To account for sociodemographics in models of individual annotator behaviour, we introduce group-specific layers to multi-annotator models. In a series of experiments for toxic content detection, we find that explicitly accounting for sociodemographic attributes in this way does not significantly improve model performance. This result shows that individual annotation behaviour depends on much more than just sociodemographics.","PeriodicalId":352845,"journal":{"name":"Annual Meeting of the Association for Computational Linguistics","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"The Ecological Fallacy in Annotation: Modeling Human Label Variation goes beyond Sociodemographics\",\"authors\":\"Matthias Orlikowski, Paul Röttger, P. Cimiano, Dirk Hovy Bielefeld University, U. Oxford, Computing Sciences Department, Bocconi University, Milan, Italy\",\"doi\":\"10.48550/arXiv.2306.11559\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many NLP tasks exhibit human label variation, where different annotators give different labels to the same texts. This variation is known to depend, at least in part, on the sociodemographics of annotators. Recent research aims to model individual annotator behaviour rather than predicting aggregated labels, and we would expect that sociodemographic information is useful for these models. On the other hand, the ecological fallacy states that aggregate group behaviour, such as the behaviour of the average female annotator, does not necessarily explain individual behaviour. To account for sociodemographics in models of individual annotator behaviour, we introduce group-specific layers to multi-annotator models. In a series of experiments for toxic content detection, we find that explicitly accounting for sociodemographic attributes in this way does not significantly improve model performance. This result shows that individual annotation behaviour depends on much more than just sociodemographics.\",\"PeriodicalId\":352845,\"journal\":{\"name\":\"Annual Meeting of the Association for Computational Linguistics\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annual Meeting of the Association for Computational Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2306.11559\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Meeting of the Association for Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2306.11559","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

许多NLP任务表现出人类标签的变化，不同的注释者对相同的文本给出不同的标签。这种变化至少在一定程度上取决于注释者的社会人口统计学。最近的研究旨在模拟单个注释者的行为，而不是预测聚合标签，我们希望社会人口统计信息对这些模型有用。另一方面，生态谬论认为，集体行为，如普通女性注释者的行为，并不一定能解释个体行为。为了解释个体注释者行为模型中的社会人口统计学，我们在多注释者模型中引入了特定于群体的层。在毒性含量检测的一系列实验中，我们发现以这种方式明确地考虑社会人口统计学属性并不能显着提高模型性能。这一结果表明，个人注释行为不仅仅取决于社会人口统计学。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Ecological Fallacy in Annotation: Modeling Human Label Variation goes beyond Sociodemographics

Many NLP tasks exhibit human label variation, where different annotators give different labels to the same texts. This variation is known to depend, at least in part, on the sociodemographics of annotators. Recent research aims to model individual annotator behaviour rather than predicting aggregated labels, and we would expect that sociodemographic information is useful for these models. On the other hand, the ecological fallacy states that aggregate group behaviour, such as the behaviour of the average female annotator, does not necessarily explain individual behaviour. To account for sociodemographics in models of individual annotator behaviour, we introduce group-specific layers to multi-annotator models. In a series of experiments for toxic content detection, we find that explicitly accounting for sociodemographic attributes in this way does not significantly improve model performance. This result shows that individual annotation behaviour depends on much more than just sociodemographics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Annual Meeting of the Association for Computational Linguistics

自引率

0.00%

发文量