{"title":"Creating a New Dataset for the Classification of Cyber Bullying","authors":"Çilem KOÇAK, Tuncay YİĞİT, Mehmet BİLEN","doi":"10.54569/aair.1206144","DOIUrl":null,"url":null,"abstract":"Regardless of young or old, people have quickly stepped into the world of internet with today's communication technologies such as phones, tablets, computers and smart devices. As the place of the Internet in people's lives increases, social media platforms are diversifying and users want to take part in these platforms. With the increase in the number of social media users, some negativities are encountered. The most important problem encountered in social media platforms is cyber bullying. Although cyber bullying seems to be a daily dialogue between social media users or between groups, the situation of encountering is increasing day by day with the diversity of shared information, content and agenda social media environments. With the development of technology, it is necessary to develop a platform that detects bullying with artificial intelligence technologies. One of the biggest difficulties in text classification problems that we encounter during the development of these platforms is the need to train the artificial intelligence algorithm to be used with labeled data. In this study, 21 different people, including journalists, athletes, scientists, doctors, politicians, comedians, social media phenomena, and artists who actively use social media, were selected in order to create the necessary dataset for training the models to be developed to detect cyber bullying situations. The public messages (mentions) of these 21 people sent via Twitter were compiled. After filtering the repetitive and meaningless messages sent by bot accounts out of 10500 tweets compiled, the number of messages in the dataset decreased to 7706. The labeling process, which is necessary for the dataset to be used for training and testing purposes in classification processes, was carried out by three independent people who were given preliminary information about cyberbullying (1=Includes Cyber bullying, 0=Does not include Cyber bullying). The majority of the tags, which were read and assigned by 3 different people, were accepted as the final class of the relevant message. Afterwards, the dataset was preprocessed in accordance with the principles of natural language processing and made suitable for classification algorithms. The findings obtained after the classification processes performed with the basic classification algorithms are shared. When the findings are examined, it is understood that the data set created has the competence to be used in the detection and prevention of cyber bullying. In this context, it is predicted that training specially developed and optimized artificial intelligence algorithms with the relevant dataset for the detection of cyberbullying will greatly increase the success rate.","PeriodicalId":286492,"journal":{"name":"Advances in Artificial Intelligence Research","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Artificial Intelligence Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54569/aair.1206144","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Regardless of young or old, people have quickly stepped into the world of internet with today's communication technologies such as phones, tablets, computers and smart devices. As the place of the Internet in people's lives increases, social media platforms are diversifying and users want to take part in these platforms. With the increase in the number of social media users, some negativities are encountered. The most important problem encountered in social media platforms is cyber bullying. Although cyber bullying seems to be a daily dialogue between social media users or between groups, the situation of encountering is increasing day by day with the diversity of shared information, content and agenda social media environments. With the development of technology, it is necessary to develop a platform that detects bullying with artificial intelligence technologies. One of the biggest difficulties in text classification problems that we encounter during the development of these platforms is the need to train the artificial intelligence algorithm to be used with labeled data. In this study, 21 different people, including journalists, athletes, scientists, doctors, politicians, comedians, social media phenomena, and artists who actively use social media, were selected in order to create the necessary dataset for training the models to be developed to detect cyber bullying situations. The public messages (mentions) of these 21 people sent via Twitter were compiled. After filtering the repetitive and meaningless messages sent by bot accounts out of 10500 tweets compiled, the number of messages in the dataset decreased to 7706. The labeling process, which is necessary for the dataset to be used for training and testing purposes in classification processes, was carried out by three independent people who were given preliminary information about cyberbullying (1=Includes Cyber bullying, 0=Does not include Cyber bullying). The majority of the tags, which were read and assigned by 3 different people, were accepted as the final class of the relevant message. Afterwards, the dataset was preprocessed in accordance with the principles of natural language processing and made suitable for classification algorithms. The findings obtained after the classification processes performed with the basic classification algorithms are shared. When the findings are examined, it is understood that the data set created has the competence to be used in the detection and prevention of cyber bullying. In this context, it is predicted that training specially developed and optimized artificial intelligence algorithms with the relevant dataset for the detection of cyberbullying will greatly increase the success rate.