ACM Transactions on Asian and Low-Resource Language Information Processing最新文献_第4页

SUSTEM: An Improved Rule-Based Sundanese Stemmer SUSTEM：基于规则的改进型巽他语词根生成器

IF 2 4区计算机科学

ACM Transactions on Asian and Low-Resource Language Information Processing Pub Date : 2024-04-05 DOI: 10.1145/3656342

Irwan Setiawan, Hung-Yu Kao

{"title":"SUSTEM: An Improved Rule-Based Sundanese Stemmer","authors":"Irwan Setiawan, Hung-Yu Kao","doi":"10.1145/3656342","DOIUrl":"https://doi.org/10.1145/3656342","url":null,"abstract":"Current Sundanese stemmers either ignore reduplication words or define rules to handle only affixes. There is a significant amount of reduplication words in the Sundanese language. Because of that, it is impossible to achieve superior stemming precision in the Sundanese language without addressing reduplication words. This paper presents an improved stemmer for the Sundanese language, which handles affixed and reduplicated words. With a Sundanese root word list, we use a rules-based stemming technique. In our approach, all stems produced by the affixes removal or normalization processes are added to the stem list. Using a stem list can help increase stemmer accuracy by reducing stemming errors caused by affix removal sequence errors or morphological issues. The current Sundanese language stemmer, RBSS, was used as a comparison. Two datasets with 8218 unique affixed words and reduplication words were evaluated. The results show that our stemmer's strength and accuracy have improved noticeably. The use of stem list and word reduplication rules improved our stemmer's affixed type recognition and allowed us to achieve up to 99.30% accuracy.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"65 Suppl 1 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140594747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Graph4IUR: Incomplete Utterance Rewriting with Semantic Graph Graph4IUR：利用语义图重写不完整语句

IF 2 4区计算机科学

ACM Transactions on Asian and Low-Resource Language Information Processing Pub Date : 2024-04-04 DOI: 10.1145/3653301

Zipeng Gao, Jinke Wang, Tong Xu, Zhefeng Wang, Yu Yang, Jia Su, Enhong Chen

{"title":"Graph4IUR: Incomplete Utterance Rewriting with Semantic Graph","authors":"Zipeng Gao, Jinke Wang, Tong Xu, Zhefeng Wang, Yu Yang, Jia Su, Enhong Chen","doi":"10.1145/3653301","DOIUrl":"https://doi.org/10.1145/3653301","url":null,"abstract":"Utterance rewriting aims to identify and supply the omitted information in human conversation, which further enables the downstream task to understand conversations more comprehensively. Recently, sequence edit methods, which leverage the overlap between two sentences, have been widely applied to narrow the search space confronted by the previous linear generation methods. However, these methods ignore the relationship between linguistic elements in the conversation, which reflects how the knowledge and thoughts are organized in human communication. In this case, although most of the content in rewritten sentences can be found in the context, we found that some connecting words expressing relationships are often missing, which results in the out-of-context problem for the previous sentence edit method. To that end, in this paper, we propose a new semantic Graph-based Incomplete Utterance Rewriting (Graph4IUR) framework, which takes the semantic graph to depict the relationship between linguistic elements and captures out-of-context words. Specifically, we adopt the Abstract Meaning Representation (AMR) [4] graph as the basic sentence-to-graph method to depict the dialogue from the graph perspective, which could well represent the high-level semantics relationships of sentences. Along this line, we further adapt the sentence editing models to rewrite without changing the sentence architecture, which brings a restriction to exploring the overlap part of the current and rewritten sentences in the IUR task. Extensive experimental results indicate that our Graph4IUR framework can effectively alleviate the out-of-context problem and improve the performance of the previous edit-based methods in the IUR task.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"14 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140594621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MIMIC: Misogyny Identification in Multimodal Internet Content in Hindi-English Code-Mixed Language MIMIC：印地语-英语代码混合语言多模态互联网内容中的厌女症识别

IF 2 4区计算机科学

ACM Transactions on Asian and Low-Resource Language Information Processing Pub Date : 2024-04-04 DOI: 10.1145/3656169

Aakash Singh, Deepawali Sharma, Vivek Kumar Singh

{"title":"MIMIC: Misogyny Identification in Multimodal Internet Content in Hindi-English Code-Mixed Language","authors":"Aakash Singh, Deepawali Sharma, Vivek Kumar Singh","doi":"10.1145/3656169","DOIUrl":"https://doi.org/10.1145/3656169","url":null,"abstract":"Over the years, social media has emerged as one of the most popular platforms where people express their views and share thoughts about various aspects. The social media content now includes a variety of components such as text, images, videos etc. One type of interest is memes, which often combine text and images. It is relevant to mention here that, social media being an unregulated platform, sometimes also has instances of discriminatory, offensive and hateful content being posted. Such content adversely affects the online well-being of the users. Therefore, it is very important to develop computational models to automatically detect such content so that appropriate corrective action can be taken. Accordingly, there have been research efforts on automatic detection of such content focused mainly on the texts. However, the fusion of multimodal data (as in memes) creates various challenges in developing computational models that can handle such data, more so in the case of low-resource languages. Among such challenges, the lack of suitable datasets for developing computational models for handling memes in low-resource languages is a major problem. This work attempts to bridge the research gap by providing a large-sized curated dataset comprising 5,054 memes in Hindi-English code-mixed language, which are manually annotated by three independent annotators. It comprises two subtasks: (i) Subtask-1 (Binary classification involving tagging a meme as misogynous or non-misogynous), and (ii) Subtask-2 (multi-label classification of memes into different categories). The data quality is evaluated by computing Krippendorff's alpha. Different computational models are then applied on the data in three settings: text-only, image-only, and multimodal models using fusion techniques. The results show that the proposed multimodal method using the fusion technique may be the preferred choice for the identification of misogyny in multimodal Internet content and that the dataset is suitable for advancing research and development in the area.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"36 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140594755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Student's Emotion Recognition using Multimodality and Deep Learning 利用多模态和深度学习识别学生情绪

IF 2 4区计算机科学

ACM Transactions on Asian and Low-Resource Language Information Processing Pub Date : 2024-04-01 DOI: 10.1145/3654797

M. Kalaiyarasi, B. V. V. Siva Prasad, Janjhyam Venkata Naga Ramesh, Ravindra Kumar Kushwaha, Ruchi Patel, Balajee J

引用次数: 0

Cleansing Jewel: A Neural Spelling Correction Model Built On Google OCR-ed Tibetan Manuscripts 净化宝石：基于谷歌 OCR 编辑的藏文手稿的神经拼写校正模型

IF 2 4区计算机科学

ACM Transactions on Asian and Low-Resource Language Information Processing Pub Date : 2024-03-30 DOI: 10.1145/3654811

Queenie Luo, Yung-Sung Chuang

引用次数: 0

Learn More Manchu Words with A New Visual-Language Framework 利用新的可视化语言框架学习更多满语单词

IF 2 4区计算机科学

ACM Transactions on Asian and Low-Resource Language Information Processing Pub Date : 2024-03-28 DOI: 10.1145/3652992

Zhiwei, Wang, Siyang, Lu, Xiang, Wei, Run, Su, Yingjun, Qi, Wei, Lu

{"title":"Learn More Manchu Words with A New Visual-Language Framework","authors":"Zhiwei, Wang, Siyang, Lu, Xiang, Wei, Run, Su, Yingjun, Qi, Wei, Lu","doi":"10.1145/3652992","DOIUrl":"https://doi.org/10.1145/3652992","url":null,"abstract":"Manchu language, a minority language of China, is of significant historical and research value. An increasing number of Manchu documents are digitized into image format for better preservation and study. Recently, many researchers focused on identifying Manchu words in digitized documents. In previous approaches, a variety of Manchu words are recognized based on visual cues. However, we notice that visual-based approaches have some obvious drawbacks. On one hand, it is difficult to distinguish between similar and distorted letters. On the other hand, portions of letters obscured by breakage and stains are hard to identify. To cope with these two challenges, we propose a visual-language framework, namely the Visual-Language framework for Manchu word Recognition (VLMR), which fuses visual and semantic information to accurately recognize Manchu words. Whenever visual information is not available, the language model can automatically associate the semantics of words. The performance of our method is further enhanced by introducing a self-knowledge distillation network. In addition, we created a new handwritten Manchu word dataset named (HMW), which contains 6,721 handwritten Manchu words. The novel approach is evaluated on WMW and HMW. The experiments show that our proposed method achieves state-of-the-art performance on both datasets.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"2010 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140314021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Application of Hybrid Image Processing Based on Artificial Intelligence in Interactive English Teaching 基于人工智能的混合图像处理在互动英语教学中的应用

IF 2 4区计算机科学

ACM Transactions on Asian and Low-Resource Language Information Processing Pub Date : 2024-03-28 DOI: 10.1145/3626822

Dou Xin, Cuiping Shi

{"title":"Application of Hybrid Image Processing Based on Artificial Intelligence in Interactive English Teaching","authors":"Dou Xin, Cuiping Shi","doi":"10.1145/3626822","DOIUrl":"https://doi.org/10.1145/3626822","url":null,"abstract":"Primary school English teaching resources play an important role in primary school English teaching. The information age requires that primary school English teaching should strengthen the use of multimedia resources and gradually realize the diversification of teaching content. Expanded reality innovation is a sort of mixture picture handling innovation, which is one of the significant innovations that would influence the improvement of fundamental schooling in the following five years. It can seamlessly output virtual objects to the real environment, which is convenient for this paper to obtain and absorb information. It can also help students to participate in exploration and cultivate their creativity and imagination. It can strengthen the cooperation between students and teachers and create various learning environments. It has an immeasurable prospect of development in the field of education. The primary school English teaching resources based on augmented reality create a realistic learning situation from two-dimensional plane to three-dimensional three-dimensional display, and enrich the presentation of primary school English teaching content. It can stimulate students’ interest in learning English and promote the transformation of English teaching methods. It is a useful attempt in the field of education. This paper made statistics on the test results of the experimental class and the control class. Most of the scores of the experimental group were between 71 and 100, a total of 27, accounting for 67.5%. The score distribution of the control class was relatively balanced, with the highest number between 61-70, and the number was 10, accounting for 25%. Therefore, it can be seen that hybrid image processing technology is important for interactive English teaching.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"33 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140313745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Syntax-aware Offensive Content Detection in Low-resourced Code-mixed Languages with Continual Pre-training 通过持续预训练在低资源代码混合语言中进行语法感知的攻击性内容检测

IF 2 4区计算机科学

ACM Transactions on Asian and Low-Resource Language Information Processing Pub Date : 2024-03-26 DOI: 10.1145/3653450

Necva Bölücü, Pelin Canbay

{"title":"Syntax-aware Offensive Content Detection in Low-resourced Code-mixed Languages with Continual Pre-training","authors":"Necva Bölücü, Pelin Canbay","doi":"10.1145/3653450","DOIUrl":"https://doi.org/10.1145/3653450","url":null,"abstract":"Social media is a widely used platform that includes a vast amount of user-generated content, allowing the extraction of information about users’ thoughts from texts. Individuals freely express their thoughts on these platforms, often without constraints, even if the content is offensive or contains hate speech. The identification and removal of offensive content from social media are imperative to prevent individuals or groups from becoming targets of harmful language. Despite extensive research on offensive content detection, addressing this challenge in code-mixed languages remains unsolved, characterised by issues such as imbalanced datasets and limited data sources. Most previous studies on detecting offensive content in these languages focus on creating datasets and applying deep neural networks, such as Recurrent Neural Networks (RNNs), or pre-trained language models (PLMs) such as BERT and its variations. Given the low-resource nature and imbalanced dataset issues inherent in these languages, this study delves into the efficacy of the syntax-aware BERT model with continual pre-training for the accurate identification of offensive content and proposes a framework called Cont-Syntax-BERT by combining continual learning with continual pre-training. Comprehensive experimental results demonstrate that the proposed Cont-Syntax-BERT framework outperforms state-of-the-art approaches. Notably, this framework addresses the challenges posed by code-mixed languages, as evidenced by its proficiency on the DravidianCodeMix [10,19] and HASOC 2109 [37] datasets. These results demonstrate the adaptability of the proposed framework in effectively addressing the challenges of code-mixed languages.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140302332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Context-enhanced Adaptive Graph Network for Time-sensitive Question Answering 用于时敏问题解答的语境增强型自适应图网络

IF 2 4区计算机科学

ACM Transactions on Asian and Low-Resource Language Information Processing Pub Date : 2024-03-22 DOI: 10.1145/3653674

Jitong Li, Shaojuan Wu, Xiaowang Zhang, Zhiyong Feng

{"title":"A Context-enhanced Adaptive Graph Network for Time-sensitive Question Answering","authors":"Jitong Li, Shaojuan Wu, Xiaowang Zhang, Zhiyong Feng","doi":"10.1145/3653674","DOIUrl":"https://doi.org/10.1145/3653674","url":null,"abstract":"Time-sensitive question answering is to answer questions limited to certain timestamps based on the given long document, which mixes abundant temporal events with an explicit or implicit timestamp. While existing models make great progress in answering time-sensitive questions, their performance degrades dramatically when a long distance separates the correct answer from the timestamp mentioned in the question. In this paper, we propose a Context-enhanced Adaptive Graph network (CoAG) to capture long-distance dependencies between sentences within the extracted question-related episodes. Specifically, we propose a time-aware episode extraction module that obtains question-related context based on timestamps in the question and document. As the involvement of episodes confuses sentences with adjacent timestamps, an adaptive message passing mechanism is designed to capture and transfer inter-sentence differences. In addition, we present a hybrid text encoder to highlight question-related context built on global information. Experimental results show that CoAG significantly improves compared to state-of-the-art models on five benchmarks. Moreover, our model has a noticeable advantage in solving long-distance time-sensitive questions, improving the EM scores by 2.03% to 6.04% on TimeQA-Hard.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"104 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140199927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Topic-Aware Masked Attentive Network for Information Cascade Prediction 用于信息级联预测的主题感知屏蔽注意力网络

IF 2 4区计算机科学

ACM Transactions on Asian and Low-Resource Language Information Processing Pub Date : 2024-03-21 DOI: 10.1145/3653449

Yu Tai, Hongwei Yang, Hui He, Xinglong Wu, Yuanming Shao, Weizhe Zhang, Arun Kumar Sangaiah

{"title":"Topic-Aware Masked Attentive Network for Information Cascade Prediction","authors":"Yu Tai, Hongwei Yang, Hui He, Xinglong Wu, Yuanming Shao, Weizhe Zhang, Arun Kumar Sangaiah","doi":"10.1145/3653449","DOIUrl":"https://doi.org/10.1145/3653449","url":null,"abstract":"Predicting information cascades holds significant practical implications, including applications in public opinion analysis, rumor control, and product recommendation. Existing approaches have generally overlooked the significance of semantic topics in information cascades or disregarded the dissemination relations. Such models are inadequate in capturing the intricate diffusion process within an information network inundated with diverse topics. To address such problems, we propose a neural-based model (named ICP-TMAN) using <underline>T</underline>opic-Aware <underline>M</underline>asked <underline>A</underline>ttentive <underline>N</underline>etwork for <underline>I</underline>nformation <underline>C</underline>ascade <underline>P</underline>rediction to predict the next infected node of an information cascade. First, we encode the topical text into user representation to perceive the user-topic dependency. Next, we employ a masked attentive network to devise the diffusion context to capture the user-context dependency. Finally, we exploit a deep attention mechanism to model historical infected nodes for user embedding enhancement to capture user-history dependency. The results of extensive experiments conducted on three real-world datasets demonstrate the superiority of ICP-TMAN over existing state-of-the-art approaches.","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"68 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140199922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0