{"title":"A DENSE SPATIAL NETWORK MODEL FOR EMOTION RECOGNITION USING LEARNING APPROACHES","authors":"L. V., Dinesh Kumar Anguraj","doi":"10.1145/3688000","DOIUrl":"https://doi.org/10.1145/3688000","url":null,"abstract":"Researchers are increasingly eager to develop techniques to extract emotional data from new sources due to the exponential growth of subjective information on Web 2.0. One of the most challenging aspects of textual emotion detection is the collection of data with emotion labels, given the subjectivity involved in labeling emotions. To address this significant issue, our research aims to aid in the development of effective solutions. We propose a Deep Convolutional Belief-based Spatial Network Model (DCB-SNM) as a semi-automated technique to tackle this challenge. This model involves two basic phases of analysis: text and video. In this process, pre-trained annotators identify the dominant emotion. Our work evaluates the impact of this automatic pre-annotation approach on manual emotion annotation from the perspectives of annotation time and agreement. The data on annotation time indicates an increase of roughly 20% when the pre-annotation procedure is utilized, without negatively affecting the annotators' skill. This demonstrates the benefits of pre-annotation approaches. Additionally, pre-annotation proves to be particularly advantageous for contributors with low prediction accuracy, enhancing overall annotation efficiency and reliability.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141920813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning and Vision-based approach for Human fall detection and classification in naturally occurring scenes using video data","authors":"Shashvat Singh, Kumkum Kumari, A. Vaish","doi":"10.1145/3687125","DOIUrl":"https://doi.org/10.1145/3687125","url":null,"abstract":"The advancement of medicine presents challenges for modern cultures, especially with unpredictable elderly falling incidents anywhere due to serious health issues. Delayed rescue for at-risk elders can be dangerous. Traditional elder safety methods like video surveillance or wearable sensors are inefficient and burdensome, wasting human resources and requiring caregivers' constant fall detection monitoring. Thus, a more effective and convenient solution is needed to ensure elderly safety. In this paper, a method is presented for detecting human falls in naturally occurring scenes using videos through a traditional Convolutional Neural Network (CNN) model, Inception-v3, VGG-19 and two versions of the You Only Look Once (YOLO) working model. The primary focus of this work is human fall detection through the utilization of deep learning models. Specifically, the YOLO approach is adopted for object detection and tracking in video scenes. By implementing YOLO, human subjects are identified, and bounding boxes are generated around them. The classification of various human activities, including fall detection is accomplished through the analysis of deformation features extracted from these bounding boxes. The traditional CNN model achieves an impressive 99.83% accuracy in human fall detection, surpassing other state-of-the-art methods. However, training time is longer compared to YOLO-v2 and YOLO-v3, but significantly shorter than Inception-v3, taking only around 10% of its total training time.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141920202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CNN-Based Models for Emotion and Sentiment Analysis Using Speech Data","authors":"Anjum Madan, Devender Kumar","doi":"10.1145/3687303","DOIUrl":"https://doi.org/10.1145/3687303","url":null,"abstract":"The study aims to present an in-depth Sentiment Analysis (SA) grounded by the presence of emotions in the speech signals. Nowadays, all kinds of web-based applications ranging from social media platforms and video-sharing sites to e-commerce applications provide support for Human-Computer Interfaces (HCIs). These media applications allow users to share their experiences in all forms such as text, audio, video, GIF, etc. The most natural and fundamental form of expressing oneself is through speech. Speech-Based Sentiment Analysis (SBSA) is the task of gaining insights into speech signals. It aims to classify the statement as neutral, negative, or positive. On the other hand, Speech Emotion Recognition (SER) categorizes speech signals into the following emotions: disgust, fear, sadness, anger, happiness, and neutral. It is necessary to recognize the sentiments along with the profoundness of the emotions in the speech signals. To cater to the above idea, a methodology is proposed defining a text-oriented SA model using the combination of CNN and Bi-LSTM techniques along with an embedding layer, applied to the text obtained from speech signals; achieving an accuracy of 84.49%. Also, the proposed methodology suggests an Emotion Analysis (EA) model based on the CNN technique highlighting the type of emotion present in the speech signal with an accuracy measure of 95.12%. The presented architecture can also be applied to different other domains like product review systems, video recommendation systems, education, health, security, etc.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141927347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
yao wen, jie li, Donghong Cai, Zhicheng Dong, Fangkai Cai, Ping Lan, quan zhou
{"title":"Adaptive Semantic Information Extraction of Tibetan Opera Mask with Recall Loss","authors":"yao wen, jie li, Donghong Cai, Zhicheng Dong, Fangkai Cai, Ping Lan, quan zhou","doi":"10.1145/3666041","DOIUrl":"https://doi.org/10.1145/3666041","url":null,"abstract":"With the development of artificial intelligence, natural language processing enables us to better understand and utilize semantic information. However, traditional object detection algorithms cannot get an effective performance, when dealed with Tibetan opera mask datasets which have the properties of limited samples, symmetrical patterns and high inter-class distances. In order to solve this issue, we propose a novel feature representation model with recall loss function for detecting different marks. In the model, we develop an adaptive feature extraction network with fused layers to extract features. Furthermore, a lightweight efficient attention mechanism is designed to enhance the significance of key features. Additionally, a recall loss function is proposed to increase the differences among classes. Finally, experimental results on the dataset of Tibetan opera mask demonstrate that our proposed model outperforms compared models.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141799656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TRGCN: A Prediction Model for Information Diffusion Based on Transformer and Relational Graph Convolutional Network","authors":"Jinghua Zhao, Xiting Lyu, Haiying Rong, Jiale Zhao","doi":"10.1145/3672074","DOIUrl":"https://doi.org/10.1145/3672074","url":null,"abstract":"In order to capture and integrate structural features and temporal features contained in social graph and diffusion cascade more effectively, an information diffusion prediction model based on Transformer and Relational Graph Convolutional Network (TRGCN) is proposed. Firstly, a dynamic heterogeneous graph composed of the social network graph and the diffusion cascade graph was constructed, and it was input into the Relational Graph Convolutional Network (RGCN) to extract the structural features of each node. Secondly, the time embedding of each node was re-encoded using Bi-directional Long Short-Term Memory (Bi-LSTM). The time decay function was introduced to give different weights to nodes at different time positions, so as to obtain the temporal features of nodes. Finally, structural features and temporal features were input into Transformer and then merged. The spatial-temporal features are obtained for information diffusion prediction. The experimental results on three real data sets of Twitter, Douban and Memetracker show that compared with the optimal model in the comparison experiment, the TRGCN model has an average increase of 4.16% in Hits@100 metric and 13.26% in map@100 metric. The validity and rationality of the model are proved.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141799053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Word Sense Disambiguation Combining Knowledge Graph And Text Hierarchical Structure","authors":"Yukun Cao, Chengkun Jin, Yijia Tang, Ziyue Wei","doi":"10.1145/3677524","DOIUrl":"https://doi.org/10.1145/3677524","url":null,"abstract":"Current supervised word sense disambiguation models have obtained high disambiguation results using annotated information of different word senses and pre-trained language models. However, the semantic data of the supervised word sense disambiguation models are in the form of short texts, and many of the corpus information is not rich enough to distinguish the semantics in different scenarios. The paper proposes a bi-encoder word sense disambiguation method combining knowledge graph and text hierarchy structure, by introducing structured knowledge from the knowledge graph to supplement more extended semantic information, using the hierarchy of contextual input text to describe the meaning of words and phrases, and constructing a BERT-based bi-encoder, introducing a graph attention network to reduce the noise information in the contextual input text, so as to improve the disambiguation accuracy of the target words in phrase form and ultimately improve the disambiguation effectiveness of the method. By comparing the method with the latest nine comparison algorithms in five test datasets, the disambiguation accuracy of the method mostly outperformed the comparison algorithms and achieved better results.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141802527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring the Correlation between Emojis and Mood Expression in Thai Twitter Discourse","authors":"Attapol T. Rutherford, Pawitsapak Akarajaradwong","doi":"10.1145/3680543","DOIUrl":"https://doi.org/10.1145/3680543","url":null,"abstract":"Mood, a long-lasting affective state detached from specific stimuli, plays an important role in behavior. Although sentiment analysis and emotion classification have garnered attention, research on mood classification remains in its early stages. This study adopts a two-dimensional structure of affect, comprising ”pleasantness” and ”activation,” to classify mood patterns. Emojis, graphic symbols representing emotions and concepts, are widely used in computer-mediated communication. Unlike previous studies that consider emojis as direct labels for emotion or sentiment, this work uses a pre-trained large language model which integrates both text and emojis to develop a mood classification model. Our contributions are three-fold. First, we annotate 10,000 Thai tweets with mood to train the models and release the dataset to the public. Second, we show that emojis contribute to determining mood to a lesser extent than text, far from mapping directly to mood. Third, through the application of the trained model, we observe the correlation of moods during the Thai political turmoil of 2019-2020 on Thai Twitter and find a significant correlation. These moods closely reflect the news events and reveal one side of Thai public opinion during the turmoil.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141809824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Translation from Tunisian Dialect to Modern Standard Arabic: Exploring Finite-State Transducers and Sequence-to-Sequence Transformer Approaches","authors":"Roua Torjmen, K. Haddar","doi":"10.1145/3681788","DOIUrl":"https://doi.org/10.1145/3681788","url":null,"abstract":"Translation from the mother tongue, including the Tunisian dialect, to modern standard Arabic is a highly significant field in natural language processing due to its wide range of applications and associated benefits. Recently, researchers have shown increased interest in the Tunisian dialect, primarily driven by the massive volume of content generated spontaneously by Tunisians on social media follow-ing the revolution. This paper presents two distinct translators for converting the Tunisian dialect into Modern Standard Arabic. The first translator utilizes a rule-based approach, employing a collection of finite state transducers and a bilingual dictionary derived from the study corpus. On the other hand, the second translator relies on deep learning models, specifically the sequence-to-sequence trans-former model and a parallel corpus. To assess, evaluate, and compare the performance of the two translators, we conducted tests using a parallel corpus comprising 8,599 words. The results achieved by both translators are noteworthy. The translator based on finite state transducers achieved a blue score of 56.65, while the transformer model-based translator achieved a higher score of 66.07.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141809007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analyzing the Effects of Transcription Errors on Summary Generation of Bengali Spoken Documents","authors":"Priyanjana Chowdhury, Nabanika Sarkar, Sanghamitra Nath, Utpal Sharma","doi":"10.1145/3678005","DOIUrl":"https://doi.org/10.1145/3678005","url":null,"abstract":"Automatic speech recognition (ASR) has become an indispensable part of the AI domain, with various speech technologies reliant on it. The quality of speech recognition depends on the amount of annotated data used to train an ASR system, among other factors. For a low-resourced language, this is a severe constraint and thus ASR quality is often poor. Humans can read through text containing ASR-errors, provided the context of the sentence is preserved. Yet in cases of transcripts generated by ASR systems of low-resource languages, multiple important words are misrecognized and the context is mostly lost; discerning such a text becomes nearly impossible. This paper analyzes the types of transcription errors that occur while generating ASR transcripts of spoken documents in Bengali, an under-resourced language predominantly spoken in India and Bangladesh. The transcripts of the Bengali spoken document are generated using the ASR of Google Cloud Speech. The paper also explores if there is an effect of such transcription errors in generating speech summaries of these spoken documents. Summarization is carried out extractively; sentences are selected from the ASR-generated text of the spoken document. Speech summaries are created by aggregating the speech-segments from the original speech of the selected sentences. Subjective evaluation shows the ‘readability’ of the spoken summaries are not degraded by ASR errors, but the quality is affected due to the reliance on intermediate text-summary containing transcription errors.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141830350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CoMix: Confronting with Noisy Label Learning with Co-training Strategies on Textual Mislabeling","authors":"Shu Zhao, Zhuoer Zhao, Yangyang Xu, Xiao Sun","doi":"10.1145/3678175","DOIUrl":"https://doi.org/10.1145/3678175","url":null,"abstract":"The existence of noisy labels is inevitable in real-world large-scale corpora. As deep neural networks are notably vulnerable to overfitting on noisy samples, this highlights the importance of the ability of language models to resist noise for efficient training. However, little attention has been paid to alleviating the influence of label noise in natural language processing. To address this problem, we present CoMix, a robust Noise-Against training strategy taking advantage of Co-training that deals with textual annotation errors in text classification tasks. In our proposed framework, the original training set is first split into labeled and unlabeled subsets according to a sample partition criteria and then applies label refurbishment on the unlabeled subsets. We implement textual interpolation in hidden space between samples on the updated subsets. Meanwhile, we employ peer diverged networks simultaneously leveraging co-training strategies to avoid the accumulation of confirm bias. Experimental results on three popular text classification benchmarks demonstrate the effectiveness of CoMix in bolstering the network’s resistance to label mislabeling under various noise types and ratios, which also outperforms the state-of-the-art methods.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141645329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}