Reynier Ortega-Bueno, Elisabetta Fersini, Paolo Rosso
{"title":"基于变压器的模型对不同语言扰动的鲁棒性研究——以反语检测为例","authors":"Reynier Ortega-Bueno, Elisabetta Fersini, Paolo Rosso","doi":"10.1111/exsy.70062","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>This study investigates the robustness of Transformer models in irony detection addressing various textual perturbations, revealing potential biases in training data concerning ironic and non-ironic classes. The perturbations involve three distinct approaches, each progressively increasing in complexity. The first approach is word masking, which employs wild-card characters or utilises BERT-specific masking through the mask token provided by BERT models. The second approach is word substitution, replacing the bias word with a contextually appropriate alternative. Lastly, paraphrasing generates a new phrase while preserving the original semantic meaning. We leverage Large Language Models (GPT 3.5 Turbo) and human inspection to ensure linguistic correctness and contextual coherence for word substitutions and paraphrasing. The results indicate that models are susceptible to these perturbations, and paraphrasing and word substitution demonstrate the most significant impact on model predictions. The irony class appears to be particularly challenging for models when subjected to these perturbations. The SHAP and LIME methods are used to correlate variations in attribution scores with prediction errors. A notable difference in the Total Variation of attribution scores is observed between original examples and cases involving bias word substitution or masking. Among the corpora used, <i>TwSemEval2018</i> emerges as the most challenging. Regarding model performance, Transformer-based models such as RoBERTa and BERTweet demonstrate superior overall performance addressing these perturbations. This research contributes to understanding the robustness and limitations of irony detection models, highlighting areas for improvement in model design and training data curation.</p>\n </div>","PeriodicalId":51053,"journal":{"name":"Expert Systems","volume":"42 6","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the Robustness of Transformer-Based Models to Different Linguistic Perturbations: A Case of Study in Irony Detection\",\"authors\":\"Reynier Ortega-Bueno, Elisabetta Fersini, Paolo Rosso\",\"doi\":\"10.1111/exsy.70062\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>This study investigates the robustness of Transformer models in irony detection addressing various textual perturbations, revealing potential biases in training data concerning ironic and non-ironic classes. The perturbations involve three distinct approaches, each progressively increasing in complexity. The first approach is word masking, which employs wild-card characters or utilises BERT-specific masking through the mask token provided by BERT models. The second approach is word substitution, replacing the bias word with a contextually appropriate alternative. Lastly, paraphrasing generates a new phrase while preserving the original semantic meaning. We leverage Large Language Models (GPT 3.5 Turbo) and human inspection to ensure linguistic correctness and contextual coherence for word substitutions and paraphrasing. The results indicate that models are susceptible to these perturbations, and paraphrasing and word substitution demonstrate the most significant impact on model predictions. The irony class appears to be particularly challenging for models when subjected to these perturbations. The SHAP and LIME methods are used to correlate variations in attribution scores with prediction errors. A notable difference in the Total Variation of attribution scores is observed between original examples and cases involving bias word substitution or masking. Among the corpora used, <i>TwSemEval2018</i> emerges as the most challenging. Regarding model performance, Transformer-based models such as RoBERTa and BERTweet demonstrate superior overall performance addressing these perturbations. This research contributes to understanding the robustness and limitations of irony detection models, highlighting areas for improvement in model design and training data curation.</p>\\n </div>\",\"PeriodicalId\":51053,\"journal\":{\"name\":\"Expert Systems\",\"volume\":\"42 6\",\"pages\":\"\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-04-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/exsy.70062\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/exsy.70062","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
On the Robustness of Transformer-Based Models to Different Linguistic Perturbations: A Case of Study in Irony Detection
This study investigates the robustness of Transformer models in irony detection addressing various textual perturbations, revealing potential biases in training data concerning ironic and non-ironic classes. The perturbations involve three distinct approaches, each progressively increasing in complexity. The first approach is word masking, which employs wild-card characters or utilises BERT-specific masking through the mask token provided by BERT models. The second approach is word substitution, replacing the bias word with a contextually appropriate alternative. Lastly, paraphrasing generates a new phrase while preserving the original semantic meaning. We leverage Large Language Models (GPT 3.5 Turbo) and human inspection to ensure linguistic correctness and contextual coherence for word substitutions and paraphrasing. The results indicate that models are susceptible to these perturbations, and paraphrasing and word substitution demonstrate the most significant impact on model predictions. The irony class appears to be particularly challenging for models when subjected to these perturbations. The SHAP and LIME methods are used to correlate variations in attribution scores with prediction errors. A notable difference in the Total Variation of attribution scores is observed between original examples and cases involving bias word substitution or masking. Among the corpora used, TwSemEval2018 emerges as the most challenging. Regarding model performance, Transformer-based models such as RoBERTa and BERTweet demonstrate superior overall performance addressing these perturbations. This research contributes to understanding the robustness and limitations of irony detection models, highlighting areas for improvement in model design and training data curation.
期刊介绍:
Expert Systems: The Journal of Knowledge Engineering publishes papers dealing with all aspects of knowledge engineering, including individual methods and techniques in knowledge acquisition and representation, and their application in the construction of systems – including expert systems – based thereon. Detailed scientific evaluation is an essential part of any paper.
As well as traditional application areas, such as Software and Requirements Engineering, Human-Computer Interaction, and Artificial Intelligence, we are aiming at the new and growing markets for these technologies, such as Business, Economy, Market Research, and Medical and Health Care. The shift towards this new focus will be marked by a series of special issues covering hot and emergent topics.