{"title":"Integrating Sarcastic Language Datasets in Various Standards for Sarcasm Detection","authors":"Shih-Hung Wu, Xie-Sheng Hong","doi":"10.1109/IRI58017.2023.00022","DOIUrl":null,"url":null,"abstract":"Sarcastic language is a special kind of figurative language that involve misperception in the text. The ambiguity and specificity of sarcastic language affects the tasks related to natural language processing and sentiment analysis. These properties make sarcasm detection an important challenge. Different datasets give very different standard on sarcasm. In this paper, we study the “generalizability” of sarcastic datasets. We compare six sarcastic datasets annotated by different research teams. Based on the classification model trained by RoBERTa to investigate the generalizability among the datasets.","PeriodicalId":290818,"journal":{"name":"2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI58017.2023.00022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Sarcastic language is a special kind of figurative language that involve misperception in the text. The ambiguity and specificity of sarcastic language affects the tasks related to natural language processing and sentiment analysis. These properties make sarcasm detection an important challenge. Different datasets give very different standard on sarcasm. In this paper, we study the “generalizability” of sarcastic datasets. We compare six sarcastic datasets annotated by different research teams. Based on the classification model trained by RoBERTa to investigate the generalizability among the datasets.