{"title":"基于Tweeteval和SOLID数据集的辱骂性语言识别转换模型的研究与回顾","authors":"Fabeela Ali Rawther, Geevarghese Titus","doi":"10.1109/ICEEICT56924.2023.10157848","DOIUrl":null,"url":null,"abstract":"Social engineering communities have become very popular among the kids and elderly alike. In this era of social media, the streaming of comments, opinions, reviews and communications is done via most common social media messaging communities like Twitter, Meta owned WhatsApp, FB and Instagram, Snapchat, telegram and YouTube comments. In this paper we perform a review on the different methods and models used to identify the offensive language using different datasets. Offensive language detection is a tedious task as it is country and language specific. The corpus used to identify the offensiveness and abusiveness is not covering all the word usages. We have done a comparison study of different methods on text to detect the post is offensive or not. The detection of abusive language is an unsolved and challenging problem to researchers in Natural Language Processing (NLP). This has led to be one of the reasons for increased level of mental instability among teenagers to elderly. The crime via social media has increased to a large value than older days. The study and surveys show that to recognize the structure and context of the language is the best way to solve this problem to an extent. The paper aims to four recent transformer models pretrained and fine-tuned for offensive language detection on the tweeteval dataset viz; DistilBERT, RoBERTa, DistilRoBERTa and DeBERTa. All the model had limitation in the performance based on the training data size used but are optimized by tuning hyper parameters during training. The models are limited to English language offensive words and recent works are going on in the area of multilingual tweets on both text and speech processing.","PeriodicalId":345324,"journal":{"name":"2023 Second International Conference on Electrical, Electronics, Information and Communication Technologies (ICEEICT)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Transformer Models for Recognizing Abusive Language An investigation and review on Tweeteval and SOLID dataset\",\"authors\":\"Fabeela Ali Rawther, Geevarghese Titus\",\"doi\":\"10.1109/ICEEICT56924.2023.10157848\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Social engineering communities have become very popular among the kids and elderly alike. In this era of social media, the streaming of comments, opinions, reviews and communications is done via most common social media messaging communities like Twitter, Meta owned WhatsApp, FB and Instagram, Snapchat, telegram and YouTube comments. In this paper we perform a review on the different methods and models used to identify the offensive language using different datasets. Offensive language detection is a tedious task as it is country and language specific. The corpus used to identify the offensiveness and abusiveness is not covering all the word usages. We have done a comparison study of different methods on text to detect the post is offensive or not. The detection of abusive language is an unsolved and challenging problem to researchers in Natural Language Processing (NLP). This has led to be one of the reasons for increased level of mental instability among teenagers to elderly. The crime via social media has increased to a large value than older days. The study and surveys show that to recognize the structure and context of the language is the best way to solve this problem to an extent. The paper aims to four recent transformer models pretrained and fine-tuned for offensive language detection on the tweeteval dataset viz; DistilBERT, RoBERTa, DistilRoBERTa and DeBERTa. All the model had limitation in the performance based on the training data size used but are optimized by tuning hyper parameters during training. The models are limited to English language offensive words and recent works are going on in the area of multilingual tweets on both text and speech processing.\",\"PeriodicalId\":345324,\"journal\":{\"name\":\"2023 Second International Conference on Electrical, Electronics, Information and Communication Technologies (ICEEICT)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 Second International Conference on Electrical, Electronics, Information and Communication Technologies (ICEEICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICEEICT56924.2023.10157848\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 Second International Conference on Electrical, Electronics, Information and Communication Technologies (ICEEICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEEICT56924.2023.10157848","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Transformer Models for Recognizing Abusive Language An investigation and review on Tweeteval and SOLID dataset
Social engineering communities have become very popular among the kids and elderly alike. In this era of social media, the streaming of comments, opinions, reviews and communications is done via most common social media messaging communities like Twitter, Meta owned WhatsApp, FB and Instagram, Snapchat, telegram and YouTube comments. In this paper we perform a review on the different methods and models used to identify the offensive language using different datasets. Offensive language detection is a tedious task as it is country and language specific. The corpus used to identify the offensiveness and abusiveness is not covering all the word usages. We have done a comparison study of different methods on text to detect the post is offensive or not. The detection of abusive language is an unsolved and challenging problem to researchers in Natural Language Processing (NLP). This has led to be one of the reasons for increased level of mental instability among teenagers to elderly. The crime via social media has increased to a large value than older days. The study and surveys show that to recognize the structure and context of the language is the best way to solve this problem to an extent. The paper aims to four recent transformer models pretrained and fine-tuned for offensive language detection on the tweeteval dataset viz; DistilBERT, RoBERTa, DistilRoBERTa and DeBERTa. All the model had limitation in the performance based on the training data size used but are optimized by tuning hyper parameters during training. The models are limited to English language offensive words and recent works are going on in the area of multilingual tweets on both text and speech processing.