Mohan K. Mali, Ranjeet R. Pawar, Sandeep A. Shinde, Satish D. Kale, Sameer V. Mulik, Asmita A. Jagtap, Pratibha A. Tambewagh, Punam U. Rajput
{"title":"使用堆叠双顾注意与 BERT 模型自动检测社交媒体上的网络欺凌行为","authors":"Mohan K. Mali, Ranjeet R. Pawar, Sandeep A. Shinde, Satish D. Kale, Sameer V. Mulik, Asmita A. Jagtap, Pratibha A. Tambewagh, Punam U. Rajput","doi":"10.1016/j.eswa.2024.125641","DOIUrl":null,"url":null,"abstract":"<div><div>Cyberbullying behaviour has drawn more attention as social media usage has grown. Teen suicide has been related to cyberbullying, among other serious and harmful effects on a person’s life. Using the appropriate natural language processing and machine learning techniques, it is possible to proactively identify bullying content to reduce and eventually eradicate cyberbullying. Accordingly, the article proposed an automated deep-learning model for detecting aggressive activity in cyberbullying. Initially, the data was extracted from the social media platform using Formspring, Instagram and MySpace datasets for perceiving cyberbullying behaviour, then the collected data are input for preprocessing. To remove the raw data, several preprocessing processes have been introduced. They consist of removing stop words, white spaces for punctuation, and changing the comments to lowercase. Lexical Density (LD) has been one of the metrics used to gauge language complexity generally. As a result, the study made use of the Feature Density (FD) to calculate how complicated certain natural language datasets are using the linguistically backed preprocessing model. After preprocessing, the data are input to the feature selection process which selects the pertinent features or attributes to include in predictive modelling and which to leave out. Since, the article proposed a Binary Chimp Optimization (BCO)-based Feature Selection (BCO-FSS) technique, which selects the subset of features for classification performance improvement. The selected features are exploited for cyberbullying behaviour detection. To identify the exploit of social media for cyberbullying text content, the article suggested Stacked Bidirectional Gated Recurrent Unit (SBiGRU) Attention for learning spatial location information and sequential semantic representations using a Bi-GRU. Additionally, the BERT model is employed as a base classifier to recognize and categorise aggressive behaviour in the textual content. The Matlab software is employed for simulation. For accuracy, precision, recall, and F1-Score, this experiment yielded a practically perfect outcome with values of 99.12%, 94.73%, 97.45%, and 93.91% respectively.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"262 ","pages":"Article 125641"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automatic detection of cyberbullying behaviour on social media using Stacked Bi-Gru attention with BERT model\",\"authors\":\"Mohan K. Mali, Ranjeet R. Pawar, Sandeep A. Shinde, Satish D. Kale, Sameer V. Mulik, Asmita A. Jagtap, Pratibha A. Tambewagh, Punam U. Rajput\",\"doi\":\"10.1016/j.eswa.2024.125641\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Cyberbullying behaviour has drawn more attention as social media usage has grown. Teen suicide has been related to cyberbullying, among other serious and harmful effects on a person’s life. Using the appropriate natural language processing and machine learning techniques, it is possible to proactively identify bullying content to reduce and eventually eradicate cyberbullying. Accordingly, the article proposed an automated deep-learning model for detecting aggressive activity in cyberbullying. Initially, the data was extracted from the social media platform using Formspring, Instagram and MySpace datasets for perceiving cyberbullying behaviour, then the collected data are input for preprocessing. To remove the raw data, several preprocessing processes have been introduced. They consist of removing stop words, white spaces for punctuation, and changing the comments to lowercase. Lexical Density (LD) has been one of the metrics used to gauge language complexity generally. As a result, the study made use of the Feature Density (FD) to calculate how complicated certain natural language datasets are using the linguistically backed preprocessing model. After preprocessing, the data are input to the feature selection process which selects the pertinent features or attributes to include in predictive modelling and which to leave out. Since, the article proposed a Binary Chimp Optimization (BCO)-based Feature Selection (BCO-FSS) technique, which selects the subset of features for classification performance improvement. The selected features are exploited for cyberbullying behaviour detection. To identify the exploit of social media for cyberbullying text content, the article suggested Stacked Bidirectional Gated Recurrent Unit (SBiGRU) Attention for learning spatial location information and sequential semantic representations using a Bi-GRU. Additionally, the BERT model is employed as a base classifier to recognize and categorise aggressive behaviour in the textual content. The Matlab software is employed for simulation. For accuracy, precision, recall, and F1-Score, this experiment yielded a practically perfect outcome with values of 99.12%, 94.73%, 97.45%, and 93.91% respectively.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"262 \",\"pages\":\"Article 125641\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2024-11-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417424025089\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417424025089","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Automatic detection of cyberbullying behaviour on social media using Stacked Bi-Gru attention with BERT model
Cyberbullying behaviour has drawn more attention as social media usage has grown. Teen suicide has been related to cyberbullying, among other serious and harmful effects on a person’s life. Using the appropriate natural language processing and machine learning techniques, it is possible to proactively identify bullying content to reduce and eventually eradicate cyberbullying. Accordingly, the article proposed an automated deep-learning model for detecting aggressive activity in cyberbullying. Initially, the data was extracted from the social media platform using Formspring, Instagram and MySpace datasets for perceiving cyberbullying behaviour, then the collected data are input for preprocessing. To remove the raw data, several preprocessing processes have been introduced. They consist of removing stop words, white spaces for punctuation, and changing the comments to lowercase. Lexical Density (LD) has been one of the metrics used to gauge language complexity generally. As a result, the study made use of the Feature Density (FD) to calculate how complicated certain natural language datasets are using the linguistically backed preprocessing model. After preprocessing, the data are input to the feature selection process which selects the pertinent features or attributes to include in predictive modelling and which to leave out. Since, the article proposed a Binary Chimp Optimization (BCO)-based Feature Selection (BCO-FSS) technique, which selects the subset of features for classification performance improvement. The selected features are exploited for cyberbullying behaviour detection. To identify the exploit of social media for cyberbullying text content, the article suggested Stacked Bidirectional Gated Recurrent Unit (SBiGRU) Attention for learning spatial location information and sequential semantic representations using a Bi-GRU. Additionally, the BERT model is employed as a base classifier to recognize and categorise aggressive behaviour in the textual content. The Matlab software is employed for simulation. For accuracy, precision, recall, and F1-Score, this experiment yielded a practically perfect outcome with values of 99.12%, 94.73%, 97.45%, and 93.91% respectively.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.