{"title":"Towards an Organically Growing Hate Speech Dataset in Hate Speech Detection Systems in a Smart Mobility Application","authors":"Ahmad Alsamman, Andreas Schmitz, M. Wimmer","doi":"10.1145/3598469.3598473","DOIUrl":null,"url":null,"abstract":"The automatic detection of hate speech online poses several challenges. A top challenge is that hate speech changes its targets and its format periodically. While the lack of available training data is a general issue in many natural language processing applications, the forementioned challenge amplifies the problem especially when taking into consideration the challenge of producing well labelled datasets. Based on the concepts of quarantining hate speech and integrating a linguistics expert in a smart mobility service provided in an administrative district in Germany, this paper proposes an approach that targets improving the training dataset quantitively and qualitatively in a running smart mobility app, the SWIA app. This proactive approach provides a long-term solution for hate speech detection models that rely on labelled datasets for training. The paper also discusses technical and practical challenges unanswered by this approach.","PeriodicalId":401026,"journal":{"name":"Proceedings of the 24th Annual International Conference on Digital Government Research","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 24th Annual International Conference on Digital Government Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3598469.3598473","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The automatic detection of hate speech online poses several challenges. A top challenge is that hate speech changes its targets and its format periodically. While the lack of available training data is a general issue in many natural language processing applications, the forementioned challenge amplifies the problem especially when taking into consideration the challenge of producing well labelled datasets. Based on the concepts of quarantining hate speech and integrating a linguistics expert in a smart mobility service provided in an administrative district in Germany, this paper proposes an approach that targets improving the training dataset quantitively and qualitatively in a running smart mobility app, the SWIA app. This proactive approach provides a long-term solution for hate speech detection models that rely on labelled datasets for training. The paper also discusses technical and practical challenges unanswered by this approach.