The Limits of Abstract Evaluation Metrics: The Case of Hate Speech Detection

Alexandra Olteanu, Kartik Talamadupula, Kush R. Varshney
{"title":"The Limits of Abstract Evaluation Metrics: The Case of Hate Speech Detection","authors":"Alexandra Olteanu, Kartik Talamadupula, Kush R. Varshney","doi":"10.1145/3091478.3098871","DOIUrl":null,"url":null,"abstract":"Wagstaff (2012) draws attention to the pervasiveness of abstract evaluation metrics that explicitly ignore or remove problem specifics. While such metrics allow practitioners to compare numbers across application domains, they offer limited insight into the impact of algorithmic decisions on humans and their perception of the algorithm's correctness. Even for problems that are mathematically the same, both the real-cost of (mathematically) identical errors, as well as their perceived-cost by users, may significantly vary according to the specifics of each problem domain, as well as of the user perceiving the result. While the real-cost of errors has been considered previously, little attention has been paid to the perceived-cost issue. We advocate for the inclusion of human-centered metrics that elicit error costs from humans from two perspectives: the nature of the error, and the user context. Focusing on hate speech detection on social media, we demonstrate that even when fixing the performance as measured by an abstract metric such as precision, user perception of correctness varies greatly depending on the nature of errors and user characteristics.","PeriodicalId":165747,"journal":{"name":"Proceedings of the 2017 ACM on Web Science Conference","volume":"274 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"38","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 ACM on Web Science Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3091478.3098871","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 38

Abstract

Wagstaff (2012) draws attention to the pervasiveness of abstract evaluation metrics that explicitly ignore or remove problem specifics. While such metrics allow practitioners to compare numbers across application domains, they offer limited insight into the impact of algorithmic decisions on humans and their perception of the algorithm's correctness. Even for problems that are mathematically the same, both the real-cost of (mathematically) identical errors, as well as their perceived-cost by users, may significantly vary according to the specifics of each problem domain, as well as of the user perceiving the result. While the real-cost of errors has been considered previously, little attention has been paid to the perceived-cost issue. We advocate for the inclusion of human-centered metrics that elicit error costs from humans from two perspectives: the nature of the error, and the user context. Focusing on hate speech detection on social media, we demonstrate that even when fixing the performance as measured by an abstract metric such as precision, user perception of correctness varies greatly depending on the nature of errors and user characteristics.
抽象评价指标的局限性:以仇恨言论检测为例
Wagstaff(2012)提请注意普遍存在的抽象评估指标,这些指标明确地忽略或删除了问题细节。虽然这样的指标允许从业者跨应用领域比较数字,但它们对算法决策对人类的影响以及他们对算法正确性的感知提供了有限的见解。即使对于数学上相同的问题,(数学上)相同错误的实际成本,以及用户感知到的成本,也可能根据每个问题领域的具体情况以及用户感知到的结果而显著不同。虽然以前已经考虑过错误的实际成本,但很少注意到感知成本问题。我们提倡包含以人为中心的指标,从两个角度引出人类的错误成本:错误的性质和用户环境。专注于社交媒体上的仇恨言论检测,我们证明,即使将性能固定为精度等抽象指标来衡量,用户对正确性的感知也会因错误的性质和用户特征而有很大差异。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信