{"title":"Humor Detection in English-Urdu Code-Mixed Language","authors":"S. Bukhari, Anusha Zubair, Muhammad Umair Arshad","doi":"10.1109/ICAI58407.2023.10136656","DOIUrl":null,"url":null,"abstract":"This research proposes a novel approach for de-tecting humor in code-mixed English-Urdu (Roman Urdu) text. Our approach combines advanced deep learning algorithms, machine learning, and transfer learning algorithms to classify code-mixed text as humorous or non-humorous. We used deep learning algorithms like CNN(Convolutional Neural Networks), LSTM(Long short-term memory), BiLSTM, and a hybrid model made from their combination after some hyper-tuning. We found that the hybrid CNN-BiLSTM model had an accuracy of approximately 75%, while XLM-RoBERTa outperformed all other models with an accuracy of 77.04 %. This is the first time these approaches have been applied to code-mixed Roman Urdu, a low-resource language.","PeriodicalId":161809,"journal":{"name":"2023 3rd International Conference on Artificial Intelligence (ICAI)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 3rd International Conference on Artificial Intelligence (ICAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAI58407.2023.10136656","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This research proposes a novel approach for de-tecting humor in code-mixed English-Urdu (Roman Urdu) text. Our approach combines advanced deep learning algorithms, machine learning, and transfer learning algorithms to classify code-mixed text as humorous or non-humorous. We used deep learning algorithms like CNN(Convolutional Neural Networks), LSTM(Long short-term memory), BiLSTM, and a hybrid model made from their combination after some hyper-tuning. We found that the hybrid CNN-BiLSTM model had an accuracy of approximately 75%, while XLM-RoBERTa outperformed all other models with an accuracy of 77.04 %. This is the first time these approaches have been applied to code-mixed Roman Urdu, a low-resource language.