{"title":"PAMR: Persian Abstract Meaning Representation Corpus","authors":"Nasim Tohidi, Chitra Dadkhah, Reza Nouralizadeh Ganji, Ehsan Ghaffari Sadr, Hoda Elmi","doi":"10.1145/3638288","DOIUrl":"https://doi.org/10.1145/3638288","url":null,"abstract":"<p>One of the most used and well-known semantic representation models is Abstract Meaning Representation (AMR). This representation has had numerous applications in natural language processing tasks in recent years. Currently, for English and Chinese languages, large annotated corpora are available. Besides, in some low-recourse languages, related corpora have been generated with less size. Although, till now to the best of our knowledge, there is not any AMR corpus for the Persian/Farsi language. Therefore, the aim of this paper is to create a Persian AMR (PAMR) corpus via translating English sentences and adjusting AMR guidelines and to solve the various challenges that are faced in this regard. The result of this research is a corpus, containing 1020 Persian sentences and their related AMR which can be used in various natural language processing tasks. In this paper, to investigate the feasibility of using the corpus, we have applied it to two natural language processing tasks: Sentiment Analysis and Text Summarization.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"12 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139508596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NLP-enabled Recommendation of Hashtags for Covid based Tweets using Hybrid BERT-LSTM Model","authors":"Kirti Jain, Rajni Jindal","doi":"10.1145/3640812","DOIUrl":"https://doi.org/10.1145/3640812","url":null,"abstract":"<p>Hashtags have become a new trend to summarize the feelings, sentiments, emotions, swinging moods, food tastes and much more. It also represents various entities like places, families and friends. It is a way to search and categorize various stuff on social media sites. With the increase in the hashtagging, there is a need to automate it, leading to the term “Hashtag Recommendation”. Also, there are plenty of posts on social media sites which remain untagged. These untagged posts get filtered out while searching and categorizing the data using a label. Such posts do not make any contribution to any helpful insight and remain abandoned. But, if the user of such posts is recommended by labels according to his post, then he may choose one or more of them, thus making the posts labelled. For such cases Hashtag recommendation comes into the picture. Although much research work has been done on Hashtag Recommendation using traditional Deep Learning approaches, not much work has been done using NLP based Bert Embedding. In this paper, we have proposed a model, BELHASH, Bert Embedding based LSTM for Hashtag Recommendation. This task is considered as a Multilabel Classification task as the hashtags are one-hot encoded into multiple binary vectors of zeros and ones using MultiLabelBinarizer. This model has been evaluated on Covid 19 tweets. We have achieved 0.72 accuracy, 0.7 Precision, 0.66 Recall and 0.67 F1-Score. This is the first paper of hashtag recommendation to the best of our knowledge combining Bert embedding with LSTM model and achieving the state of the arts results.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"8 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139476667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Khalid Haseeb, Irshad Ahmad, Mohammad Siraj, Naveed Abbas, Gwanggil Jeon
{"title":"Multi-Criteria Decision-Making Framework with Fuzzy Queries for Multimedia Data Fusion","authors":"Khalid Haseeb, Irshad Ahmad, Mohammad Siraj, Naveed Abbas, Gwanggil Jeon","doi":"10.1145/3640339","DOIUrl":"https://doi.org/10.1145/3640339","url":null,"abstract":"<p>Multimedia Internet of Things (MIoT) is widely explored in many smart applications for connectivity with wireless communication. Such networks are not like ordinary networks because it has to collect a massive amount of data and are further forwarded to processing systems. As MIoT is very limited in terms of resources for healthcare, smart homes, etc., therefore, energy efficiency with reliable data transmission is a significant research challenge. As smart applications rely on bounded constraints, therefore duplicate and unnecessary data transmission should be minimized. In addition, the timely delivery of data in crucial circumstances has a significant impact on any proposed system. Consequently, this paper presents a fuzzy logic-based edge computing framework to provide cooperative decision-making while avoiding inefficient use of the sensing power of smart devices. The proposed framework can be applied to critical applications to improve response time and processing cost. It consists of the following two functional components: Firstly, it provides the automated routing process with a natural language interface at the sink node. Secondly, to ensure reasonable performance, it also transmits semantic data between sensors using fuzzy queries and security. According to the performance evaluation, the proposed framework significantly outperformed related studies in terms of energy consumption, packet overhead, network throughput, and end-to-end delay.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"255 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139476495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ahmad Nasayreh, Rabia Emhamed Al Mamlook, Ghassan Samara, Hasan Gharaibeh, Mohammad Aljaidi, Dalia Alzu'Bi, Essam Al-Daoud, Laith Abualigah
{"title":"Arabic Sentiment Analysis for ChatGPT Using Machine Learning Classification Algorithms: A Hyperparameter Optimization Technique","authors":"Ahmad Nasayreh, Rabia Emhamed Al Mamlook, Ghassan Samara, Hasan Gharaibeh, Mohammad Aljaidi, Dalia Alzu'Bi, Essam Al-Daoud, Laith Abualigah","doi":"10.1145/3638285","DOIUrl":"https://doi.org/10.1145/3638285","url":null,"abstract":"<p>In the realm of ChatGPT's language capabilities, exploring Arabic Sentiment Analysis emerges as a crucial research focus. This study centers on ChatGPT, a popular machine learning model engaging in dialogues with users, garnering attention for its exceptional performance and widespread impact, particularly in the Arab world. The objective is to assess people's opinions about ChatGPT, categorizing them as positive or negative. Despite abundant research in English, there is a notable gap in Arabic studies. We assembled a dataset from Twitter, comprising 2,247 tweets, classified by Arabic language specialists. Employing various machine learning algorithms, including Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), and Naive Bayes (NB), we implemented hyperparameter optimization techniques such as Bayesian optimization, Grid Search, and random search to select the best hyperparameters which contribute to achieve the best performance. Through training and testing, performance enhancements were observed with optimization algorithms. SVM exhibited superior performance, achieving 90% accuracy, 88% precision, 95% recall, and 91% F1 score with Grid Search. These findings contribute valuable insights into ChatGPT's impact in the Arab world, offering a comprehensive understanding of sentiment analysis through machine learning methodologies.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"255 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139469368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Safia Kanwal, Muhammad Kamran Malik, Zubair Nawaz, Khawar Mehmood
{"title":"SEEUNRS: Semantically-Enriched-Entity-Based Urdu News Recommendation System","authors":"Safia Kanwal, Muhammad Kamran Malik, Zubair Nawaz, Khawar Mehmood","doi":"10.1145/3639049","DOIUrl":"https://doi.org/10.1145/3639049","url":null,"abstract":"<p>The advancement in the production, distribution, and consumption of news has fostered easy access to the news with fair challenges. The main challenge is to present the right news to the right audience. News recommendation system is one of the technological solutions to this problem. Much work has been done on news recommendation systems for the major languages of the world, but trivial work has been done for resource-poor languages like Urdu. Another significant hurdle in the development of an efficient news recommendation system is the Scarcity of an accessible and suitable Urdu dataset. To this end, an Urdu news mobile application was used to collect the news data and user feedback for one month. After refinement, the first-ever Urdu dataset of 100 users and 23250 news is curated for the Urdu news recommendation system. In addition, a Semantically-Enriched-Entity-Based Urdu News Recommendation System (SEEUNRS) is proposed. The proposed scheme exploits the hidden features of a news article and entities to suggest the right article to the right audience. Results have shown that the presented model has an improvement of 6.9% in the F-1 measure from traditional recommendation system techniques.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"124 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139422301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Granularity Knowledge Sharing in Low-Resource Neural Machine Translation","authors":"Chenggang Mi, Shaoliang Xie, Yi Fan","doi":"10.1145/3639930","DOIUrl":"https://doi.org/10.1145/3639930","url":null,"abstract":"As the rapid development of deep learning methods, neural machine translation (NMT) has attracted more and more attention in recent years. However, lack of bilingual resources decreases the performance of the low-resource NMT model seriously. To overcome this problem, several studies put their efforts on knowledge transfer from high-resource language pairs to low-resource language pairs. However, these methods usually focus on one single granularity of language and the parameter sharing among different granularities in NMT is not well studied. In this paper, we propose to improve the parameter sharing in low-resource NMT by introducing multi-granularity knowledge such as word, phrase and sentence. This knowledge can be monolingual and bilingual. We build the knowledge sharing model for low-resource NMT based on a multi-task learning (MTL) framework, three auxiliary tasks such as syntax parsing, cross-lingual named entity recognition and natural language generation are selected for the low-resource NMT. Experimental results show that the proposed method consistently outperforms six strong baseline systems on several low-resource language pairs.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"16 17","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139443142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transliteration Characteristics in Romanized Assamese Language Social Media Text and Machine Transliteration","authors":"Hemanta Baruah, Sanasam Ranbir Singh, Priyankoo Sarmah","doi":"10.1145/3639565","DOIUrl":"https://doi.org/10.1145/3639565","url":null,"abstract":"<p>This article aims to understand different transliteration behaviors of Romanized Assamese text on social media. Assamese, a language that belongs to the Indo-Aryan language family, is also among the 22 scheduled languages in India. With the increasing popularity of social media in India and also the common use of the English Qwerty keyboard, Indian users on social media express themselves in their native languages, but using the Roman/Latin script. Unlike some other popular South Asian languages (say <b>Pinyin</b> for Chinese), Indian languages do not have a common standard romanization convention for writing on social media platforms. Assamese and English are two very different orthographical languages. Thus, considering both orthographic and phonemic characteristics of the language, this study tries to explain how Assamese vowels, vowel diacritics, and consonants are represented in Roman transliterated form. From a dataset of romanized Assamese social media texts collected from three popular social media sites: (Facebook, YouTube and Twitter), we have manually labeled them with their native Assamese script. A comparison analysis is also carried out between the transliterated Assamese social media texts with six different Assamese romanization schemes that reflect how Assamese users on social media do not adhere to any fixed romanization scheme. We have built three separate character-level transliteration models from our dataset. One using a traditional phrase-based statistical machine transliteration model, (1). PBSMT model and two separate neural transliteration models: (2). BiLSTM neural seq2seq model with attention, and (3). Neural transformer model. A thorough error analysis has been performed on the transliteration result obtained from the three state-of-the-art models mentioned above. This may help to build a more robust machine transliteration system for the Assamese social media domain in the future. Finally, an attention analysis experiment is also carried out with the help of attention weight scores taken from the character-level BiLSTM neural seq2seq transliteration model built from our dataset.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"54 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139373283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Revolutionizing Healthcare: NLP, Deep Learning, and WSN Solutions for Managing the COVID-19 Crisis","authors":"Ajay P., Nagaraj B., R. Arun Kumar","doi":"10.1145/3639566","DOIUrl":"https://doi.org/10.1145/3639566","url":null,"abstract":"<p>The COVID-19 outbreak in 2020 catalyzed a global socio-economic upheaval, compelling nations to embrace digital technologies as a means of countering economic downturns and ensuring efficient communication systems. This paper delves into the role of Natural Language Processing (NLP) in harnessing wireless connectivity during the pandemic. The examination assesses how wireless networks have affected various facets of crisis management, including virus tracking, optimizing healthcare, facilitating remote education, and enabling unified communications. Additionally, the article underscores the importance of digital inclusion in mitigating disease outbreaks and reconnecting marginalized communities. To address these challenges, a Dual CNN-based BERT model is proposed. BERT model is used to extract the text features, the internal layers of BERT excel at capturing intricate contextual details concerning words and phrases, rendering them highly valuable as features for a wide array of text analysis tasks. The significance of dual CNN is capturing the unique capability to seamlessly integrate both character-level and word-level information. This fusion of insights from different levels of textual analysis proves especially valuable in handling text data that is noisy, complex, or presents challenges related to misspellings and domain-specific terminology. The proposed model is evaluated using the simulated WSN-based text data for crisis management.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"54 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139373153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Mental Health Analysis in Social Media for Low-resourced Languages","authors":"Muskan Garg","doi":"10.1145/3638761","DOIUrl":"https://doi.org/10.1145/3638761","url":null,"abstract":"<p>The surge in internet use for expression of personal thoughts and beliefs has made it increasingly feasible for the social Natural Language Processing (NLP) research community to find and validate associations between <i>social media posts</i> and <i>mental health status</i>. Cross-sectional and longitudinal studies of low-resourced social media data bring to fore the importance of real-time responsible Artificial Intelligence (AI) models for mental health analysis in native languages. Aiming to classify research for social computing and tracking advances in the development of learning-based models, we propose a comprehensive survey on <i>mental health analysis for social media</i> and posit the need of analyzing <i>low-resourced social media data for mental health</i>. We first classify three components for computing on social media as: <b>SM</b>- data mining/ natural language processing on <i>social media</i>, <b>IA</b>- <i>integrated applications</i> with social media data and user-network modeling, and <b>NM</b>- user and <i>network modeling</i> on social networks. To this end, we posit the need of mental health analysis in different languages of East Asia (e.g. Chinese, Japanese, Korean), South Asia (Hindi, Bengali, Tamil), Southeast Asia (Malay, Thai, Vietnamese), European languages (Spanish, French) and the Middle East (Arabic). Our comprehensive study examines available resources and recent advances in low-resourced languages for different aspects of SM, IA and NM to discover new frontiers as potential field of research.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"20 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2023-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139064404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DSISA: A New Neural Machine Translation Combining Dependency Weight and Neighbors","authors":"Lingfang Li, Aijun Zhang, Ming-Xing Luo","doi":"10.1145/3638762","DOIUrl":"https://doi.org/10.1145/3638762","url":null,"abstract":"<p>Most of the previous neural machine translations (NMT) rely on parallel corpus. Integrating explicitly prior syntactic structure information can improve the neural machine translation. In this paper, we propose a Syntax Induced Self-Attention (SISA) which explores the influence of dependence relation between words through the attention mechanism and fine-tunes the attention allocation of the sentence through the obtained dependency weight. We present a new model, Double Syntax Induced Self-Attention (DSISA), which fuses the features extracted by SISA and a compact convolution neural network (CNN). SISA can alleviate long dependency in sentence, while CNN captures the limited context based on neighbors. DSISA utilizes two different neural networks to extract different features for richer semantic representation and replaces the first layer of Transformer encoder. DSISA not only makes use of the global feature of tokens in sentences but also the local feature formed with adjacent tokens. Finally, we perform simulation experiments that verify the performance of the new model on standard corpora.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"2 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2023-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139064355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}