{"title":"Automatic Algerian Sarcasm Detection from Texts and Images","authors":"Kheira Zineb Bousmaha, Khaoula Hamadouche, Hadjer Djouabi, Lamia Hadrich-Belguith","doi":"10.1145/3670403","DOIUrl":null,"url":null,"abstract":"<p>In recent years, the number of Algerian Internet users has significantly increased, providing a valuable opportunity for collecting and utilizing opinions and sentiments expressed online. They now post not just texts but also images. However, to benefit from this wealth of information, it is crucial to address the challenge of sarcasm detection, which poses a limitation in sentiment analysis. Sarcasm often involves the use of non-literal and ambiguous language, making its detection complex. To enhance the quality and relevance of sentiment analysis, it is essential to develop effective methods for sarcasm detection. By overcoming this limitation, we can fully harness the expressed online opinions and benefit from their valuable insights for a better understanding of trends and sentiments among the Algerian public. In this work, our aim is to develop a comprehensive system that addresses sarcasm detection in Algerian dialect, encompassing both text and image analysis. We propose a hybrid approach that combines linguistic characteristics and machine learning techniques for text analysis. Additionally, for image analysis, we utilized the deep learning model VGG-19 for image classification, and employed the EasyOCR technique for Arabic text extraction. By integrating these approaches, we strive to create a robust system capable of detecting sarcasm in both textual and visual content in the Algerian dialect. Our system achieved an accuracy of 92.79% for the textual models and 89.28% for the visual model.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"7 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Asian and Low-Resource Language Information Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3670403","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, the number of Algerian Internet users has significantly increased, providing a valuable opportunity for collecting and utilizing opinions and sentiments expressed online. They now post not just texts but also images. However, to benefit from this wealth of information, it is crucial to address the challenge of sarcasm detection, which poses a limitation in sentiment analysis. Sarcasm often involves the use of non-literal and ambiguous language, making its detection complex. To enhance the quality and relevance of sentiment analysis, it is essential to develop effective methods for sarcasm detection. By overcoming this limitation, we can fully harness the expressed online opinions and benefit from their valuable insights for a better understanding of trends and sentiments among the Algerian public. In this work, our aim is to develop a comprehensive system that addresses sarcasm detection in Algerian dialect, encompassing both text and image analysis. We propose a hybrid approach that combines linguistic characteristics and machine learning techniques for text analysis. Additionally, for image analysis, we utilized the deep learning model VGG-19 for image classification, and employed the EasyOCR technique for Arabic text extraction. By integrating these approaches, we strive to create a robust system capable of detecting sarcasm in both textual and visual content in the Algerian dialect. Our system achieved an accuracy of 92.79% for the textual models and 89.28% for the visual model.
期刊介绍:
The ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) publishes high quality original archival papers and technical notes in the areas of computation and processing of information in Asian languages, low-resource languages of Africa, Australasia, Oceania and the Americas, as well as related disciplines. The subject areas covered by TALLIP include, but are not limited to:
-Computational Linguistics: including computational phonology, computational morphology, computational syntax (e.g. parsing), computational semantics, computational pragmatics, etc.
-Linguistic Resources: including computational lexicography, terminology, electronic dictionaries, cross-lingual dictionaries, electronic thesauri, etc.
-Hardware and software algorithms and tools for Asian or low-resource language processing, e.g., handwritten character recognition.
-Information Understanding: including text understanding, speech understanding, character recognition, discourse processing, dialogue systems, etc.
-Machine Translation involving Asian or low-resource languages.
-Information Retrieval: including natural language processing (NLP) for concept-based indexing, natural language query interfaces, semantic relevance judgments, etc.
-Information Extraction and Filtering: including automatic abstraction, user profiling, etc.
-Speech processing: including text-to-speech synthesis and automatic speech recognition.
-Multimedia Asian Information Processing: including speech, image, video, image/text translation, etc.
-Cross-lingual information processing involving Asian or low-resource languages.
-Papers that deal in theory, systems design, evaluation and applications in the aforesaid subjects are appropriate for TALLIP. Emphasis will be placed on the originality and the practical significance of the reported research.