Grisel Miranda-Piña;Roberto Alejo;Eréndira Rendón-Lara;Vicente García
{"title":"Detection of violent speech against women in Mexican tweets using an active learning approach","authors":"Grisel Miranda-Piña;Roberto Alejo;Eréndira Rendón-Lara;Vicente García","doi":"10.1109/TLA.2024.10473002","DOIUrl":null,"url":null,"abstract":"In Latin American and Caribbean States the verbal violence against women on social networks, such as Twitter, is a serious threat that has been addressed through the implementation of social norms, public policies, and social movements. Nevertheless, a challenge is the effective and automatic real-time detection of violent tweets. In this sense, traditional machine learning algorithms have been proposed to tackle social issues where the training process is performed in a static manner. However, considering that Twitter is a dynamic environment where a vast of tweets are generated each second, it requires powerful machine learning algorithms that could exploit this pool of unlabeled data to be incorporated into the model through continuous updates. This paper explores an active learning method based on uncertainty sampling, which identifies the most confusing tweets to be labeled by an expert in real-time. This focused selection prioritizes which data can be used to train a multilayer perceptron that can achieve a better performance with fewer training samples. Experimental results show that including new samples yields promising results, increasing the AUC from 0.8712 to 0.8833.","PeriodicalId":55024,"journal":{"name":"IEEE Latin America Transactions","volume":null,"pages":null},"PeriodicalIF":1.3000,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10473002","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Latin America Transactions","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10473002/","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In Latin American and Caribbean States the verbal violence against women on social networks, such as Twitter, is a serious threat that has been addressed through the implementation of social norms, public policies, and social movements. Nevertheless, a challenge is the effective and automatic real-time detection of violent tweets. In this sense, traditional machine learning algorithms have been proposed to tackle social issues where the training process is performed in a static manner. However, considering that Twitter is a dynamic environment where a vast of tweets are generated each second, it requires powerful machine learning algorithms that could exploit this pool of unlabeled data to be incorporated into the model through continuous updates. This paper explores an active learning method based on uncertainty sampling, which identifies the most confusing tweets to be labeled by an expert in real-time. This focused selection prioritizes which data can be used to train a multilayer perceptron that can achieve a better performance with fewer training samples. Experimental results show that including new samples yields promising results, increasing the AUC from 0.8712 to 0.8833.
期刊介绍:
IEEE Latin America Transactions (IEEE LATAM) is an interdisciplinary journal focused on the dissemination of original and quality research papers / review articles in Spanish and Portuguese of emerging topics in three main areas: Computing, Electric Energy and Electronics. Some of the sub-areas of the journal are, but not limited to: Automatic control, communications, instrumentation, artificial intelligence, power and industrial electronics, fault diagnosis and detection, transportation electrification, internet of things, electrical machines, circuits and systems, biomedicine and biomedical / haptic applications, secure communications, robotics, sensors and actuators, computer networks, smart grids, among others.