A novel algorithm for sarcasm detection using supervised machine learning approach

Q3 Engineering

AIMS Electronics and Electrical Engineering Pub Date : 2022-01-01 DOI:10.3934/electreng.2022021

A. Amer, Tamanna Siddiqu

{"title":"A novel algorithm for sarcasm detection using supervised machine learning approach","authors":"A. Amer, Tamanna Siddiqu","doi":"10.3934/electreng.2022021","DOIUrl":null,"url":null,"abstract":"Sarcasm means the opposite of what you desire to express, particularly to insult a person. Sarcasm detection in social networks SNs such as Twitter is a significant task as it has assisted in studying tweets using NLP. Many existing study-related methods have always focused only on the content-based on features in sarcastic words, leaving out the lexical-based features and context-based features knowledge in isolation. This shows a loss of the semantics of terms in a sarcastic expression. This study proposes an improved model to detect sarcasm from SNs. We used three feature set engineering: context-based on features set, Sarcastic based on features, and lexical based on features. Two Novel Algorithms for an effective model to detect sarcasm are divided into two stages. The first used two algorithms one with preprocessing, and the second algorithm with feature sets. To deal with data from SNs. We applied various supervised machine learning (ML) such as k-nearest neighbor classifier (KNN), na?ve Bayes (NB), support vector machine (SVM), and Random Forest (RF) classifiers with TF-IDF feature extraction representation data. To model evaluation metrics, evaluate sarcasm detection model performance in precision, accuracy, recall, and F1 score by 100%. We achieved higher results in Lexical features with KNN 89.19 % accuracy campers to other classifiers. Combining two feature sets (Sarcastic and Lexical) has shown slight improvement with the same classifier KNN; we achieved 90.00% accuracy. When combining three feature sets (Sarcastic, Lexical, and context), the accuracy is shown slight improvement. Also, the same classifier we achieved is a 90.51% KNN classifier. We perform the model differently to see the effect of three feature sets through the experiment individual, combining two feature sets and gradually combining three feature sets. When combining all features set together, achieve the best accuracy with the KNN classifier.","PeriodicalId":36329,"journal":{"name":"AIMS Electronics and Electrical Engineering","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AIMS Electronics and Electrical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3934/electreng.2022021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Engineering","Score":null,"Total":0}

引用次数: 1

Abstract

Sarcasm means the opposite of what you desire to express, particularly to insult a person. Sarcasm detection in social networks SNs such as Twitter is a significant task as it has assisted in studying tweets using NLP. Many existing study-related methods have always focused only on the content-based on features in sarcastic words, leaving out the lexical-based features and context-based features knowledge in isolation. This shows a loss of the semantics of terms in a sarcastic expression. This study proposes an improved model to detect sarcasm from SNs. We used three feature set engineering: context-based on features set, Sarcastic based on features, and lexical based on features. Two Novel Algorithms for an effective model to detect sarcasm are divided into two stages. The first used two algorithms one with preprocessing, and the second algorithm with feature sets. To deal with data from SNs. We applied various supervised machine learning (ML) such as k-nearest neighbor classifier (KNN), na?ve Bayes (NB), support vector machine (SVM), and Random Forest (RF) classifiers with TF-IDF feature extraction representation data. To model evaluation metrics, evaluate sarcasm detection model performance in precision, accuracy, recall, and F1 score by 100%. We achieved higher results in Lexical features with KNN 89.19 % accuracy campers to other classifiers. Combining two feature sets (Sarcastic and Lexical) has shown slight improvement with the same classifier KNN; we achieved 90.00% accuracy. When combining three feature sets (Sarcastic, Lexical, and context), the accuracy is shown slight improvement. Also, the same classifier we achieved is a 90.51% KNN classifier. We perform the model differently to see the effect of three feature sets through the experiment individual, combining two feature sets and gradually combining three feature sets. When combining all features set together, achieve the best accuracy with the KNN classifier.

查看原文本刊更多论文

一种基于监督式机器学习的讽刺语检测新算法

讽刺的意思是与你想表达的相反，尤其是侮辱一个人。社交网络(如Twitter)中的讽刺检测是一项重要的任务，因为它有助于使用NLP研究推文。现有的许多相关研究方法都只关注讽刺词中基于内容的特征，而孤立地忽略了基于词汇的特征和基于语境的特征知识。这表明在讽刺表达中术语语义的缺失。本研究提出了一种改进的社交网站讽刺语检测模型。我们使用了三种特征集工程:基于上下文的特征集，基于特征的讽刺，基于特征的词汇。本文将两种新型的讽刺语检测算法分为两个阶段。第一种算法使用了两种算法，一种是预处理算法，另一种是特征集算法。处理来自SNs的数据。我们应用了各种监督机器学习(ML)，如k-最近邻分类器(KNN)， na?使用TF-IDF特征提取表示数据的贝叶斯(NB)、支持向量机(SVM)和随机森林(RF)分类器。为了建模评估指标，以100%的比例评估讽刺检测模型在精度、准确性、召回率和F1分数方面的表现。我们在词法特征上取得了更高的结果，KNN的准确率为89.19%，高于其他分类器。结合两个特征集(讽刺和词法)在相同分类器KNN下表现出轻微的改进;我们达到了90.00%的准确率。当结合三个特征集(讽刺、词汇和上下文)时，准确率略有提高。同样，我们得到的分类器是一个90.51%的KNN分类器。我们通过实验个体，结合两个特征集，逐步结合三个特征集来不同地执行模型，观察三个特征集的效果。当将所有特征集组合在一起时，使用KNN分类器可以达到最佳精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊