{"title":"Annotation Framework for Hate Speech Identification in Tweets: Case Study of Tweets During Kenyan Elections","authors":"Edward Ombui, Moses Karani, Lawrence Muchemi","doi":"10.23919/ISTAFRICA.2019.8764868","DOIUrl":null,"url":null,"abstract":"Considering the colossal amount of user-generated content on social media, it has become increasingly difficult to monitor hateful content being published on public online spaces, especially during the electioneering periods, particularly in Kenya. In this regard, it is crucial to automate the identification of hate speech in order to manage the volume, variety, veracity and velocity of this content. In this research, we postulate a supervised machine learning approach whereby annotation of the training data set is critical in determining the performance of the trained classifier. Therefore, we develop an annotation framework based on Sternberg’s (2003) hate theory and test its performance in classifying about 5k tweets using 3 human annotators per tweet. Preliminary results indicate an intercoder reliability score of 0.5027 based on Krippendorff’s alpha.","PeriodicalId":420572,"journal":{"name":"2019 IST-Africa Week Conference (IST-Africa)","volume":"249 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IST-Africa Week Conference (IST-Africa)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/ISTAFRICA.2019.8764868","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Considering the colossal amount of user-generated content on social media, it has become increasingly difficult to monitor hateful content being published on public online spaces, especially during the electioneering periods, particularly in Kenya. In this regard, it is crucial to automate the identification of hate speech in order to manage the volume, variety, veracity and velocity of this content. In this research, we postulate a supervised machine learning approach whereby annotation of the training data set is critical in determining the performance of the trained classifier. Therefore, we develop an annotation framework based on Sternberg’s (2003) hate theory and test its performance in classifying about 5k tweets using 3 human annotators per tweet. Preliminary results indicate an intercoder reliability score of 0.5027 based on Krippendorff’s alpha.