Overview of the HASOC Track at FIRE 2020: Hate Speech and Offensive Language Identification in Tamil, Malayalam, Hindi, English and German

Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation Pub Date : 2020-12-16 DOI:10.1145/3441501.3441517

Thomas Mandl, Sandip J Modha, M. Anandkumar, Bharathi Raja Chakravarthi

引用次数: 167

Abstract

This paper presents the HASOC track and its two parts. HASOC is dedicated to evaluate technology for finding Offensive Language and Hate Speech. HASOC is creating test collections for languages with few resources and English for comparison. The first track within HASOC has continued work from 2019 and provided a testbed of Twitter posts for Hindi, German and English. The second track within HASOC has created test resources for Tamil and Malayalam in native and Latin script. Posts were extracted mainly from Youtube and Twitter. Both tracks have attracted much interest and over 40 research groups have participated as well as described their approaches in papers. In this overview, we present the tasks, the data and the main results.

查看原文本刊更多论文

2020年FIRE大会HASOC专题概述:泰米尔语、马拉雅拉姆语、印地语、英语和德语中的仇恨言论和攻击性语言识别

本文介绍了HASOC轨道及其两部分。HASOC致力于评估发现攻击性语言和仇恨言论的技术。HASOC正在为资源较少的语言创建测试集，并将英语作为比较。HASOC的第一条轨道从2019年开始继续工作，并为印地语、德语和英语提供Twitter帖子的测试平台。HASOC的第二个轨道是创建泰米尔语和马拉雅拉姆语的本地和拉丁文字测试资源。这些帖子主要来自Youtube和Twitter。这两个方向都吸引了很多人的兴趣，超过40个研究小组参与其中，并在论文中描述了他们的方法。在本综述中，我们介绍了任务、数据和主要结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation

自引率

0.00%

发文量