{"title":"RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts","authors":"V. Indumathi, S. SanthanaMegala","doi":"10.5121/ijnlc.2023.12303","DOIUrl":"https://doi.org/10.5121/ijnlc.2023.12303","url":null,"abstract":"Cyberbullying is currently one of the most important research fields. The majority of researchers have contributed to research on bully text identification in English texts or comments, due to the scarcity of data; analyzing Tamil textstemming is frequently a tedious job. Tamil is a morphologically diverse and agglutinative language. The creation of a Tamil stemmer is not an easy undertaking. After examining the major difficulties encountered, proposed the rule-based iterative preprocessing algorithm (RBIPA). In this attempt, Tamil morphemes and lemmas were extracted using the suffix stripping technique and a supervised machine learning algorithm for classify the word based for pronouns and proper nouns. The novelty of proposed system is developing a preprocessing algorithm for iterative stemming; lemmatize process to discovering exact words from the Tamil Language comments. RBIPA shows 84.96% of accuracy in the given Test Dataset which hasa total of 13000 words.","PeriodicalId":179392,"journal":{"name":"International Journal on Natural Language Computing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133766674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TEXT SUMMARIZATION IN MONGOLIAN LANGUAGE","authors":"Chuluundorj Begz","doi":"10.5121/ijnlc.2023.12206","DOIUrl":"https://doi.org/10.5121/ijnlc.2023.12206","url":null,"abstract":"Textual information in this new era, it is difficult to manually extract the summary of a large data different areas of social communication accumulates the enormous amounts of data. Therefore, it is important to develop methods for searching and absorbing relevant information, selecting important sentences, paragraphs from large texts, to summarize texts by finding topics of the text and frequency based clustering of sentences. In this paper, the author presents some ideas on using mathematical models in presenting the source text into a shorter version with semantics, graph-based approach for text summarization in Mongolian language.","PeriodicalId":179392,"journal":{"name":"International Journal on Natural Language Computing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116783553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SIMPLE SENTENCE IN A·WE","authors":"Amanda Aski Macdonald Momin","doi":"10.5121/ijnlc.2023.12208","DOIUrl":"https://doi.org/10.5121/ijnlc.2023.12208","url":null,"abstract":"A·we is the standard variety of the A·chik language also known as A·chikku which is also commonly known as the Garo language. The Garo language belongs to the Tibeto-Burman branch of the Sino-Tibetan language family. Since Syntax involves arranging words to create logical phrases, clauses, and sentences, simple sentence is an important part of syntax and thus knowing about the simple sentence in A·we forms the basis of writing correct sentences in this variety of the Garo language. Noam Chomsky, the famous Linguist used the phrase, “ colorless green ideas sleep furiously’ in his book, Syntactic Structures (1957) as an example of a sentence which is syntactically and grammatically correct because it has the correct word order and the verb is consistent with the subject but is semantically incorrect. Chomsky (1957) thus illustrates that the rules governing syntax are different from the meaning conveyed by words. We can observe that there are subject and predicate in a simple sentence in language, which is the same for A·we. It is not essential that a simple sentence must be a short sentence and it is also possible to write a simple sentence if there is only one predicate used with a number of subjects to make a long sentence. Such sentences are still called a simple sentence. In this paper, we will discuss some classification of simple sentence in A·we which will further contribute to the study of syntax in A·we as well as aid in constructing proper sentences in the language.","PeriodicalId":179392,"journal":{"name":"International Journal on Natural Language Computing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129558026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LEXIS AND SYNTAX OF MEDICINE PRODUCT WARNINGS IN THE PHILIPPINES","authors":"Shielanie Soriano Dacumos","doi":"10.5121/ijnlc.2023.12204","DOIUrl":"https://doi.org/10.5121/ijnlc.2023.12204","url":null,"abstract":"In the Philippines, parents refused their children having an anti-measles and anti-dengue vaccines, which created a medical outbreak. This may not happen if product warnings have been given and explained to the parents. Indeed, product warnings are found to be in their optimal position in safeguarding the life of consumer-patients. This paper anatomizes the lexical features of medicine product warnings in the Philippines which are crucial in the response discourses. A range of linguistic frameworks were applied and significant findings were drawn. Gaps were identified on the use of noun abstractness, synthetic personalization, field continuum, adjectives, and adverbs. Such an investigation brought up the transparency of communicative features of medicine safety texts. In the end, linguistic components create a vital impact on the legal content adequacy of medicine product warnings, unfolding the vitalities of these messages in facilitating informed decision-making among consumer-patients.","PeriodicalId":179392,"journal":{"name":"International Journal on Natural Language Computing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125948903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding Chinese Moral Stories with Further Pre-Training","authors":"Jing Qian, Yong Yue, Katie Atkinson, Gangmin Li","doi":"10.5121/ijnlc.2023.12201","DOIUrl":"https://doi.org/10.5121/ijnlc.2023.12201","url":null,"abstract":"The goal of moral understanding is to grasp the theoretical concepts embedded in a narrative by delving beyond the concrete occurrences and dynamic personas. Specifically, the narrative is compacted into a single statement without involving any characters within the original text, necessitating a more astute language model that can comprehend connotative morality and exhibit commonsense reasoning. The “pretraining + fine-tuning” paradigm is widely embraced in neural language models. In this paper, we propose an intermediary phase to establish an improved paradigm of “pre-training + further pre-training + finetuning”. Further pre-training generally refers to continual learning on task-specific or domain-relevant corpora before being applied to target tasks, which aims at bridging the gap in data distribution between the phases of pre-training and fine-tuning. Our work is based on a Chinese dataset named STORAL-ZH that composes of 4k human-written story-moral pairs. Furthermore, we design a two-step process of domain-adaptive pre-training in the intermediary phase. The first step depends on a newly-collected Chinese dataset of Confucian moral culture. And the second step bases on the Chinese version of a frequently-used commonsense knowledge graph (i.e. ATOMIC) to enrich the backbone model with inferential knowledge besides morality. By comparison with several advanced models including BERTbase, RoBERTa-base and T5-base, experimental results on two understanding tasks demonstrate the effectiveness of our proposed three-phase paradigm.","PeriodicalId":179392,"journal":{"name":"International Journal on Natural Language Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129786233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NEW TRENDS IN LESS-RESOURCED LANGUAGE PROCESSING: CASE OF AMAZIGH LANGUAGE","authors":"Fadoua ATAA ALLAH, S. Boulaknadel","doi":"10.5121/ijnlc.2023.12207","DOIUrl":"https://doi.org/10.5121/ijnlc.2023.12207","url":null,"abstract":"The coronavirus (COVID-19) pandemic has dramatically changed lifestyles in much of the world. It forced people to profoundly review their relationships and interactions with digital technologies. Nevertheless, people prefer using these technologies in their favorite languages. Unfortunately, most languages are considered even as low or less-resourced, and they do not have the potential to keep up with the new needs. Therefore, this study explores how this kind of languages, mainly the Amazigh, will behave in the wholly digital environment, and what to expect for new trends. Contrary to last decades, the research gap of low and less-resourced languages is continually reducing. Nonetheless, the literature review exploration unveils the need for innovative research to review their informatization roadmap, while rethinking, in a valuable way, people’s behaviors in this increasingly changing environment. Through this work, we will try first to introduce the technology access challenges, and explain how natural language processing contributes to their overcoming. Then, we will give an overview of existing studies and research related to under and less-resourced languages’ informatization, with an emphasis on the Amazigh language. After, based on these studies and the agile revolution, a new roadmap will be presented.","PeriodicalId":179392,"journal":{"name":"International Journal on Natural Language Computing","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114400630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Robust Joint-Training Graph Neural Networks Model for Event Detection with Symmetry and Asymmetry Noisylabels","authors":"Mingxiang Li, Huang Xing, Tengyun Wang, Jiaxuan Dai, Kaiming Xiao","doi":"10.5121/ijnlc.2023.12209","DOIUrl":"https://doi.org/10.5121/ijnlc.2023.12209","url":null,"abstract":"Events are the core element of information in descriptive corpus. Although many progresses have beenmade in Event Detection (ED), it is still a challenge in Natural Language Processing (NLP) to detect event information from data with unavoidable noisy labels. A robust Joint-training Graph ConvolutionNetworks (JT-GCN) model is proposed to meet the challenge of ED tasks with noisy labels in this paper. Specifically, we first employ two Graph Convolution Networks with Edge Enhancement (EE-GCN) tomake predictions simultaneously. A joint loss combining the detection loss and the contrast loss fromtwonetworks is then calculated for training. Meanwhile, a small-loss selection mechanism is introduced tomitigate the impact of mislabeled samples in networks training process. These two networks gradually reach an agreement on the ED tasks as joint-training progresses. Corrupted data with label noise are generated from the benchmark dataset ACE2005. Experiments on ED tasks has been conducted with bothsymmetry and asymmetry label noise on dif erent level. The experimental results show that the proposedmodel is robust to the impact of label noise and superior to the state-of-the-art models for EDtasks.","PeriodicalId":179392,"journal":{"name":"International Journal on Natural Language Computing","volume":"148 1-2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114034124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Onyenwe, Samuel N. C. Nwagbo, Ebele Onyedinma Onyedinma, Onyedika Ikechukwu-Onyenwe Onyenwe, Chidinma A. Nwafor, Obinna Agbata
{"title":"Location-based Sentiment Analysis of 2019 Nigeria Presidential Election using a Voting Ensemble Approach","authors":"I. Onyenwe, Samuel N. C. Nwagbo, Ebele Onyedinma Onyedinma, Onyedika Ikechukwu-Onyenwe Onyenwe, Chidinma A. Nwafor, Obinna Agbata","doi":"10.5121/ijnlc.2023.12101","DOIUrl":"https://doi.org/10.5121/ijnlc.2023.12101","url":null,"abstract":"Nigeria president Buhari defeated his closest rival Atiku Abubakar by over 3 million votes. He was issued a Certificate of Return and was sworn in on 29 May 2019. However, there were claims of widespread hoax by the opposition. The sentiment analysis captures the opinions of the masses over social media for global events. In this paper, we use 2019 Nigeria presidential election tweets to perform sentiment analysis through the application of a voting ensemble approach (VEA) in which the predictions from multiple techniques are combined to find the best polarity of a tweet (sentence). This is to determine public views on the 2019 Nigeria Presidential elections and compare them with actual election results. Our sentiment analysis experiment is focused on location-based viewpoints where we used Twitter location data. For this experiment, we live-streamed Nigeria 2019 election tweets via Twitter API to create tweets dataset of 583816 size, pre-processed the data, and applied VEA by utilizing three different Sentiment Classifiers to obtain the choicest polarity of a given tweet. Furthermore, we segmented our tweets dataset into Nigerian states and geopolitical zones, then plotted state-wise and geopolitical-wise user sentiments towards Buhari and Atiku and their political parties. The overall objective of the use of states/geopolitical zones is to evaluate the similarity between the sentiment of location-based tweets compared to actual election results. The results reveal that whereas there are election outcomes that coincide with the sentiment expressed on Twitter social media in most cases as shown by the polarity scores of different locations, there are also some election results where our location analysis similarity test failed.","PeriodicalId":179392,"journal":{"name":"International Journal on Natural Language Computing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130356659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Contrastive Study of the Negation Morphemes ( English, Kurdish and Arabic)","authors":"Safia Zivingi","doi":"10.5121/ijnlc.2022.11504","DOIUrl":"https://doi.org/10.5121/ijnlc.2022.11504","url":null,"abstract":"This paper is studying the problems facing languages (English, Kurdish, and Arabic) due to the lack of symmetry between the negation morphemes in these languages, which differ in their numbers, the specifics of their uses, and the multiplicity of their meanings, at the two levels: general language and science terms. The negation morphemes in the Arabic and Kurdish languages are located at the beginning of the word as prefixes. That is, neither of them has infixes or suffixes that indicate the negative (morphologically), that is, at the level of the word or the lexical unit, and are grammatically at the level of the sentence. These negation morphemes may enter the middle of the sentence in Kurdish, but in Arabic, the negation morphemes are located at the forefront of the sentence. Most of the negation affixes in English are also prefixes, and a few are the suffixes denoting negation, the most prominent of which is (-less); English is also devoid of negation infixes. The lack of equivalence between the negation morphemes between quantitative and qualitative languages leads to chaos and disorder when transferred between languages. This necessitates the need to establish rules regulating the work of these multiple morphemes for the function of negation, at morphological structure, syntactic, semantic features, and the nature of their uses, at the level of the language itself, and at the level of contrastive between the sending and receiving languages.","PeriodicalId":179392,"journal":{"name":"International Journal on Natural Language Computing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131117883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Chess and NIM with Transformers","authors":"Michael Deleo, Erhan Guven","doi":"10.5121/ijnlc.2022.11501","DOIUrl":"https://doi.org/10.5121/ijnlc.2022.11501","url":null,"abstract":"Representing a board game’s state space, actions, and transition model by text-based notation enables a wide variety of NLP applications suited to the strengths of language models. These few shot language models can help gain insight into a variety of interesting problems such as learning the rules of a game, detecting player behavior patterns, player attribution, and ultimately learning the game in an unsupervised manner. In this study, we firstly applied the BERT model to the simple combinatorial Nim game to analyze BERT’s performance in the varying presence of noise. We analyzed the model’s performance versus three agents, namely Nim Guru, a Random player, and a Q-learner. Secondly, we applied the BERT model to the game of chess through a large set of high ELO stockfish games with exhaustive encyclopedia openings. Finally, we have shown that model practically learns the rules of the Nim and chess, and have shown that it can competently play against its opponent and in some interesting conditions win.","PeriodicalId":179392,"journal":{"name":"International Journal on Natural Language Computing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128256718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}