Huan Rong, Zhongfeng Chen, Zhenyu Lu, Fan Xu, Victor S. Sheng
{"title":"Multization: Multi-Modal Summarization Enhanced by Multi-Contextually Relevant and Irrelevant Attention Alignment","authors":"Huan Rong, Zhongfeng Chen, Zhenyu Lu, Fan Xu, Victor S. Sheng","doi":"10.1145/3651983","DOIUrl":"https://doi.org/10.1145/3651983","url":null,"abstract":"<p>This paper focuses on the task of Multi-Modal Summarization with Multi-Modal Output for China JD.COM e-commerce product description containing both source text and source images. In the context learning of multi-modal (text and image) input, there exists a semantic gap between text and image, especially in the cross-modal semantics of text and image. As a result, capturing shared cross-modal semantics earlier becomes crucial for multi-modal summarization. On the other hand, when generating the multi-modal summarization, based on the different contributions of input text and images, the relevance and irrelevance of multi-modal contexts to the target summary should be considered, so as to optimize the process of learning cross-modal context to guide the summary generation process and to emphasize the significant semantics within each modality. To address the aforementioned challenges, Multization has been proposed to enhance multi-modal semantic information by multi-contextually relevant and irrelevant attention alignment. Specifically, a Semantic Alignment Enhancement mechanism is employed to capture shared semantics between different modalities (text and image), so as to enhance the importance of crucial multi-modal information in the encoding stage. Additionally, the IR-Relevant Multi-Context Learning mechanism is utilized to observe the summary generation process from both relevant and irrelevant perspectives, so as to form a multi-modal context that incorporates both text and image semantic information. The experimental results in the China JD.COM e-commerce dataset demonstrate that the proposed Multization method effectively captures the shared semantics between the input source text and source images, and highlights essential semantics. It also successfully generates the multi-modal summary (including image and text) that comprehensively considers the semantics information of both text and image.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"65 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140072938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Am I hurt?: Evaluating Psychological Pain Detection in Hindi Text using Transformer-based Models","authors":"Ravleen Kaur, M. P. S. Bhatia, Akshi Kumar","doi":"10.1145/3650206","DOIUrl":"https://doi.org/10.1145/3650206","url":null,"abstract":"<p>The automated evaluation of pain is critical for developing effective pain management approaches that seek to alleviate while preserving patients’ functioning. Transformer-based models can aid in detecting pain from Hindi text data gathered from social media by leveraging their ability to capture complex language patterns and contextual information. By understanding the nuances and context of Hindi text, transformer models can effectively identify linguistic cues, sentiment and expressions associated with pain enabling the detection and analysis of pain-related content present in social media posts. The purpose of this research is to analyse the feasibility of utilizing NLP techniques to automatically identify pain within Hindi textual data, providing a valuable tool for pain assessment in Hindi-speaking populations. The research showcases the HindiPainNet model, a deep neural network that employs the IndicBERT model, classifying the dataset into two class labels {pain, no_pain} for detecting pain in Hindi textual data. The model is trained and tested using a novel dataset, दर्द-ए-शायरी (pronounced as <i>Dard-e-Shayari</i>) curated using posts from social media platforms. The results demonstrate the model's effectiveness, achieving an accuracy of 70.5%. This pioneer research highlights the potential of utilizing textual data from diverse sources to identify and understand pain experiences based on psychosocial factors. This research could pave the path for the development of automated pain assessment tools that help medical professionals comprehend and treat pain in Hindi speaking populations. Additionally, it opens avenues to conduct further NLP-based multilingual pain detection research, addressing the needs of diverse language communities.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"108 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140034668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TransVAE-PAM: A Combined Transformer and DAG-based Approach for Enhanced Fake News Detection in Indian Context","authors":"Shivani Tufchi, Tanveer Ahmed, Ashima Yadav, Krishna Kant Agrawal, Ankit Vidyarthi","doi":"10.1145/3651160","DOIUrl":"https://doi.org/10.1145/3651160","url":null,"abstract":"<p>In this study, we introduce a novel method, “TransVAE-PAM”, for the classification of fake news articles, tailored specifically for the Indian context. The approach capitalizes on state-of-the-art contextual and sentence transformer-based embedding models to generate article embeddings. Furthermore, we also try to address the issue of compact model size. In this respect, we employ a Variational Autoencoder (VAE) and <i>β</i>-VAE to reduce the dimensions of the embeddings, thereby yielding compact latent representations. To capture the thematic essence or important topics in the news articles, we use the Pachinko Allocation Model (PAM) model, a Directed Acyclic Graph (DAG) based approach, to generate meaningful topics. These two facets of representation - the reduced-dimension embeddings from the VAE and the extracted topics from the PAM model - are fused together to create a feature set. This representation is subsequently channeled into five different methods for fake news classification. Furthermore, we use eight distinct transformer-based architectures to test the embedding generation. To validate the feasibility of the proposed approach, we have conducted extensive experimentation on a proprietary dataset. The dataset is sourced from “Times of India” and other online media. Considering the size of the dataset, large-scale experiments are conducted on an NVIDIA supercomputer. Through this comprehensive numerical investigation, we have achieved an accuracy of 96.2% and an F1 score of 96% using the DistilBERT transformer architecture. By complementing the method via topic modeling, we record a performance improvement with the accuracy and F1 score both at 97%. These results indicate a promising direction toward leveraging the combination of advanced topic models into existing classification schemes to enhance research on fake news detection.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"119 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140057217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Opinion Mining on Social Media Text Using Optimized Deep Belief Networks","authors":"S. Vinayaga Vadivu, P. Nagaraj, B. S. Murugan","doi":"10.1145/3649502","DOIUrl":"https://doi.org/10.1145/3649502","url":null,"abstract":"<p>In the digital world, most people spend their leisure and precious time on social media networks such as Facebook, Twitter. Instagram, and so on. Moreover, users post their views of products, services, political parties on their social sites. This information is viewed by many other users and brands. With the aid of these posts and tweets, the emotions, polarities of users are extracted to obtain the opinion about products or services. To analyze these posts sentiment analysis or opinion mining techniques are applied. Subsequently, this field rapidly attracts many researchers to conduct their research work due to the availability of an enormous number of data on social media networks. Further, this method can also be used to analyze the text to extract the sentiments which are classified as moderate, neutral, low extreme, and high extreme. However, the extraction of sentiment is an arduous one from the social media datasets, since it includes formal and informal texts, emojis, symbols. Hence to extract the feature vector from the accessed social media datasets and to perform accurate classification to group the texts based on the appropriate sentiments we proposed a novel method known as, Deep Belief Network-based Dynamic Grouping-based Cooperative optimization method DBN based DGCO. Exploiting this method the data are preprocessed to attain the required format of text and henceforth the feature vectors are extracted by the ICS algorithm. Furthermore, the extracted datasets are classified and grouped into moderate, neutral, low extreme, and high extreme with DBN based DGCO method. For experimental analysis, we have taken two social media datasets and analyzed the performance of the proposed method in terms of performance metrics such as accuracy/precision, recall, F1 Score, and ROC with HEMOS, WOA-SITO, PDCNN, and NB-LSVC state-of-art works. The acquired accuracy/precision, recall, and F1 Score, of our proposed ICS-DBN-DGCO method, are 89%, 80%, 98.2%, respectively.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"79 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140034763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Survey of Knowledge Enhanced Pre-trained Language Models","authors":"Jian Yang, Xinyu Hu, Gang Xiao, Yulong Shen","doi":"10.1145/3631392","DOIUrl":"https://doi.org/10.1145/3631392","url":null,"abstract":"<p>Pre-trained language models learn informative word representations on a large-scale text corpus through self-supervised learning, which has achieved promising performance in fields of natural language processing (NLP) after fine-tuning. These models, however, suffer from poor robustness and lack of interpretability. We refer to pre-trained language models with knowledge injection as knowledge-enhanced pre-trained language models (KEPLMs). These models demonstrate deep understanding and logical reasoning and introduce interpretability. In this survey, we provide a comprehensive overview of KEPLMs in NLP. We first discuss the advancements in pre-trained language models and knowledge representation learning. Then we systematically categorize existing KEPLMs from three different perspectives. Finally, we outline some potential directions of KEPLMs for future research.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140008632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploration on Advanced Intelligent Algorithms of Artificial Intelligence for Verb Recognition in Machine Translation","authors":"Qinghua Ai, Qingyan Ai, Jun Wang","doi":"10.1145/3649891","DOIUrl":"https://doi.org/10.1145/3649891","url":null,"abstract":"<p>This article aimed to address the problems of word order confusion, context dependency, and ambiguity in traditional machine translation (MT) methods for verb recognition. By applying advanced intelligent algorithms of artificial intelligence, verb recognition can be better processed and the quality and accuracy of MT can be improved. Based on Neural machine translation (NMT), basic attention mechanisms, historical attention information, dynamically obtain information related to the generated words, and constraint mechanisms were introduced to embed semantic information, represent polysemy, and annotate semantic roles of verbs. This article used the Workshop on machine translation (WMT), British National Corpus (BNC), Gutenberg, Reuters Corpus, OpenSubtitles corpus, and enhanced the data in the corpus. The improved NMT model was compared with traditional NMT models, Rule Based machine translation (RBMT), and Statistical machine translation (SMT). The experimental results showed that the average verb semantic matching degree of the improved NMT model in 5 corpora was 0.85, and the average Bilingual Evaluation Understudy (BLEU) score in 5 corpora was 0.90. The improved NMT model in this article can effectively improve the accuracy of verb recognition in MT, providing new methods for verb recognition in MT.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"1 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140008685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging Bidirectionl LSTM with CRFs for Pashto tagging","authors":"Farooq Zaman, Onaiza Maqbool, Jaweria Kanwal","doi":"10.1145/3649456","DOIUrl":"https://doi.org/10.1145/3649456","url":null,"abstract":"<p>Part-of-speech tagging plays a vital role in text processing and natural language understanding. Very few attempts have been made in the past for tagging Pashto Part-of-Speech. In this work, we present LSTM based approach for Pashto part-of-speech tagging with special focus on ambiguity resolution. Initially we created a corpus of Pashto sentences having words with multiple meanings and their tags. We introduce a powerful sentences representation and new architecture for Pashto text processing. The accuracy of the proposed approach is compared with state-of-the-art Hidden Markov Model. Our Model shows 87.60% accuracy for all words excluding punctuations and 95.45% for ambiguous words, on the other hand Hidden Markov Model shows 78.37% and 44.72% accuracy respectively. Results show that our approach outperform Hidden Markov Model in Part-of-Speech tagging for Pashto text.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139977793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Hybrid Scene Text Script Identification Network for regional Indian Languages","authors":"Veronica Naosekpam, Nilkanta Sahu","doi":"10.1145/3649439","DOIUrl":"https://doi.org/10.1145/3649439","url":null,"abstract":"<p>In this work, we introduce WAFFNet, an attention-centric feature fusion architecture tailored for word-level multi-lingual scene text script identification. Motivated by the limitations of traditional approaches that rely exclusively on feature-based methods or deep learning strategies, our approach amalgamates statistical and deep features to bridge the gap. At the core of WAFFNet, we utilized the merits of Local Binary Pattern —a prominent descriptor capturing low-level texture features with high-dimensional, semantically-rich convolutional features. This fusion is judiciously augmented by a spatial attention mechanism, ensuring targeted emphasis on semantically critical regions of the input image. To address the class imbalance problem in multi-class classification scenarios, we employed a weighted objective function. This not only regularizes the learning process but also addresses the class imbalance problem. The architectural integrity of WAFFNet is preserved through an end-to-end training paradigm, leveraging transfer learning to expedite convergence and optimize performance metrics. Considering the under-representation of regional Indian languages in current datasets, we meticulously curated IIITG-STLI2023, a comprehensive dataset encapsulating English alongside six under-represented Indian languages: Hindi, Kannada, Malayalam, Telugu, Bengali, and Manipuri. Rigorous evaluation of the IIITG-STLI2023, as well as the established MLe2e and SIW-13 datasets, underscores WAFFNet’s supremacy over both traditional feature-engineering approaches as well as state-of-the-art deep learning frameworks. Thus, the proposed WAFFNet framework offers a robust and effective solution for language identification in scene text images.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"171 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139949859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Natural Language Processing System for Text Classification Corpus Based on Machine Learning","authors":"Yawen Su","doi":"10.1145/3648361","DOIUrl":"https://doi.org/10.1145/3648361","url":null,"abstract":"<p>A classification system for hazardous materials in air traffic control was investigated using the Human Factors Analysis and Classification System (HFACS) framework and natural language processing to prevent hazardous situations in air traffic control. Based on the development of the HFACS standard, an air traffic control hazard classification system will be created. The dangerous data of the aviation safety management system is selected by dead bodies, classified and marked in 5 levels. TFIDF TextRank text classification method based on key content extraction and text classification model based on CNN and BERT model were used in the experiment to solve the problem of small samples, many labels and random samples in hazardous environment of air pollution control. The results show that the total cost of model training time and classification accuracy is the highest when the keywords are around 8. As the number of points increases, the time spent in dimensioning decreases and affects accuracy. When the number of points reaches about 93, the time spent in determining the size increases, but the accuracy of the allocation remains close to 0.7, but the increase in the value of time leads to a decrease in the total cost. It has been proven that extracting key content can solve text classification problems for small companies and contribute to further research in the development of security systems.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"25 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139910190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SCT:Summary Caption Technique for Retrieving Relevant Images in Alignment with Multimodal Abstractive Summary","authors":"Shaik Rafi, Ranjita Das","doi":"10.1145/3645029","DOIUrl":"https://doi.org/10.1145/3645029","url":null,"abstract":"<p>This work proposes an efficient Summary Caption Technique(SCT) which considers the multimodal summary and image captions as input to retrieve the correspondence images from the captions that are highly influential to the multimodal summary. Matching a multimodal summary with an appropriate image is a challenging task in computer vision (CV) and natural language processing (NLP) fields. Merging of these fields are tedious, though the research community has steadily focused on the cross-modal retrieval. These issues include the visual question-answering, matching queries with the images, and semantic relationship matching between two modalities for retrieving the corresponding image. Relevant works consider in questions to match the relationship of visual information, object detection and to match the text with visual information, and employing structural-level representation to align the images with the text. However, these techniques are primarily focused on retrieving the images to text or for the image captioning. But less effort has been spent on retrieving relevant images for the multimodal summary. Hence, our proposed technique extracts and merge features in Hybrid Image Text(HIT) layer and captions in the semantic embeddings with word2vec where the contextual features and semantic relationships are compared and matched with each vector between the modalities, with cosine semantic similarity. In cross-modal retrieval, we achieve top five related images and align the relevant images to the multimodal summary that achieves the highest cosine score among the retrieved images. The model has been trained with seq-to-seq modal with 100 epochs, besides reducing the information loss by the sparse categorical cross entropy. Further, experimenting with the multimodal summarization with multimodal output dataset (MSMO), in cross-modal retrieval, helps to evaluate the quality of image alignment with an image-precision metric that demonstrate the best results.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"7 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139752680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}