文本数据反讽检测分类技术的综合研究

Anandkumar D. Dave, NIKITA PARITOSH DESAI
{"title":"文本数据反讽检测分类技术的综合研究","authors":"Anandkumar D. Dave, NIKITA PARITOSH DESAI","doi":"10.1109/ICEEOT.2016.7755036","DOIUrl":null,"url":null,"abstract":"During the last decade majority of research has been carried out in the area of sentiment Analysis of textual data available on the web. Sentiment Analysis has its challenges, and one of them is Sarcasm. Classification of sarcastic sentences is a difficult task due to representation variations in the textual form sentences. This can affect many Natural Language Processing based applications. Sarcasm is the kind of representation to convey the different sentiment than presented. In our study we have tried to identify different supervised classification techniques mainly used for sarcasm detection and their features. Also we have analyzed results of the classification techniques, on textual data available in various languages on review related sites, social media sites and micro-blogging sites. Furthermore, for each method studied, our paper presents the analysis of data set generation and feature selection process used thereof. We also carried out preliminary experiment to detect sarcastic sentences in “Hindi” language. We trained SVM classifier with 10X validation with simple Bag-Of-Words as features and TF-IDF as frequency measure of the feature. We found that this simple model based on “bag-of-words” feature accurately classified 50% of sarcastic sentences. Thus, primary experiment has revealed the fact that simple Bag-of-Words are not sufficient for sarcasm detection.","PeriodicalId":383674,"journal":{"name":"2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":"{\"title\":\"A comprehensive study of classification techniques for sarcasm detection on textual data\",\"authors\":\"Anandkumar D. Dave, NIKITA PARITOSH DESAI\",\"doi\":\"10.1109/ICEEOT.2016.7755036\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"During the last decade majority of research has been carried out in the area of sentiment Analysis of textual data available on the web. Sentiment Analysis has its challenges, and one of them is Sarcasm. Classification of sarcastic sentences is a difficult task due to representation variations in the textual form sentences. This can affect many Natural Language Processing based applications. Sarcasm is the kind of representation to convey the different sentiment than presented. In our study we have tried to identify different supervised classification techniques mainly used for sarcasm detection and their features. Also we have analyzed results of the classification techniques, on textual data available in various languages on review related sites, social media sites and micro-blogging sites. Furthermore, for each method studied, our paper presents the analysis of data set generation and feature selection process used thereof. We also carried out preliminary experiment to detect sarcastic sentences in “Hindi” language. We trained SVM classifier with 10X validation with simple Bag-Of-Words as features and TF-IDF as frequency measure of the feature. We found that this simple model based on “bag-of-words” feature accurately classified 50% of sarcastic sentences. Thus, primary experiment has revealed the fact that simple Bag-of-Words are not sufficient for sarcasm detection.\",\"PeriodicalId\":383674,\"journal\":{\"name\":\"2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-03-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"26\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICEEOT.2016.7755036\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEEOT.2016.7755036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26

摘要

在过去的十年中,大多数研究都是在网络上可用的文本数据的情感分析领域进行的。情感分析有其挑战,其中之一就是讽刺。由于语篇形式句的表征变化,对讽刺句进行分类是一项困难的任务。这可能会影响许多基于自然语言处理的应用程序。讽刺是一种表达不同于所表达的情感的表达方式。在我们的研究中,我们试图识别主要用于讽刺检测的不同监督分类技术及其特征。我们还分析了分类技术的结果,对评论相关网站、社交媒体网站和微博网站上各种语言的文本数据进行了分类。此外,对于所研究的每种方法,本文给出了数据集生成和特征选择过程的分析。我们还进行了“印地语”讽刺句子的初步检测实验。我们以简单的Bag-Of-Words作为特征,TF-IDF作为特征的频率度量,训练了10倍验证的SVM分类器。我们发现这个基于“词袋”特征的简单模型准确地分类了50%的讽刺句子。因此,初步实验表明,简单的词袋并不足以用于讽刺检测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A comprehensive study of classification techniques for sarcasm detection on textual data
During the last decade majority of research has been carried out in the area of sentiment Analysis of textual data available on the web. Sentiment Analysis has its challenges, and one of them is Sarcasm. Classification of sarcastic sentences is a difficult task due to representation variations in the textual form sentences. This can affect many Natural Language Processing based applications. Sarcasm is the kind of representation to convey the different sentiment than presented. In our study we have tried to identify different supervised classification techniques mainly used for sarcasm detection and their features. Also we have analyzed results of the classification techniques, on textual data available in various languages on review related sites, social media sites and micro-blogging sites. Furthermore, for each method studied, our paper presents the analysis of data set generation and feature selection process used thereof. We also carried out preliminary experiment to detect sarcastic sentences in “Hindi” language. We trained SVM classifier with 10X validation with simple Bag-Of-Words as features and TF-IDF as frequency measure of the feature. We found that this simple model based on “bag-of-words” feature accurately classified 50% of sarcastic sentences. Thus, primary experiment has revealed the fact that simple Bag-of-Words are not sufficient for sarcasm detection.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信