{"title":"Quantifying extreme opinions on Reddit amidst the 2023 Israeli–Palestinian conflict","authors":"Alessio Guerra , Marcello Lepre , Oktay Karakuş","doi":"10.1016/j.nlp.2025.100156","DOIUrl":"10.1016/j.nlp.2025.100156","url":null,"abstract":"<div><div>This study investigates the dynamics of extreme opinions on social media during the 2023 Israeli–Palestinian conflict, utilising a comprehensive dataset of over 450,000 posts from four Reddit subreddits (<em>r/Palestine</em>, <em>r/Judaism</em>, <em>r/IsraelPalestine</em>, and <em>r/worldnews</em>). A lexicon-based, unsupervised methodology was developed to measure “extreme opinions” by considering factors such as anger, polarity, and subjectivity. The analysis identifies significant peaks in extremism scores that correspond to pivotal real-life events, such as the IDF’s bombings of Al Quds Hospital and the Jabalia Refugee Camp, and the end of a ceasefire following a terrorist attack. Additionally, this study explores the distribution and correlation of these scores across different subreddits and over time, providing insights into the propagation of polarised sentiments in response to conflict events. By examining the quantitative effects of each score on extremism and analysing word cloud similarities through Jaccard indices, the research offers a nuanced understanding of the factors driving extreme online opinions. Our findings show that posts exhibiting extreme sentiment surged up to 80% (an increase of 0.3 in extremism score above the average of 0.405 at the end of October) during key conflict events. Compared to recent studies that have not explicitly quantified extremism in an unsupervised manner, we contribute to the literature by addressing this gap through a novel extremism score, derived from sentiment polarity, anger, and subjectivity, to analyse Reddit discourse surrounding the 2023 Israel–Palestine conflict. This approach captures the complex interplay between real-world events and online reactions, while acknowledging the inherent challenges of measuring extremism in dynamic social media environments. Our approach also enables scalable monitoring of public sentiment extremity, providing valuable insights for policymakers and conflict researchers.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"11 ","pages":"Article 100156"},"PeriodicalIF":0.0,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144147094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding the effects of human-written paraphrases in LLM-generated text detection","authors":"Hiu Ting Lau, Arkaitz Zubiaga","doi":"10.1016/j.nlp.2025.100151","DOIUrl":"10.1016/j.nlp.2025.100151","url":null,"abstract":"<div><div>Natural Language Generation has been rapidly developing with the advent of large language models (LLMs). While their usage has sparked significant attention from the general public, it is important for readers to be aware when a piece of text is LLM-generated. This has brought about the need for building models that enable automated LLM-generated text detection, with the aim of mitigating potential negative outcomes of such content. Existing LLM-generated detectors show competitive performances in telling apart LLM-generated and human-written text, but this performance is likely to deteriorate when paraphrased texts are considered. In this study, we devise a new data collection strategy to collect Human & LLM Paraphrase Collection (HLPC), a first-of-its-kind dataset that incorporates human-written texts and paraphrases, as well as LLM-generated texts and paraphrases. With the aim of understanding the effects of human-written paraphrases on the performance of SOTA LLM-generated text detectors OpenAI RoBERTa and watermark detectors, we perform classification experiments that incorporate human-written paraphrases, watermarked and non-watermarked LLM-generated documents from GPT and OPT, and LLM-generated paraphrases from DIPPER and BART. The results show that the inclusion of human-written paraphrases has a significant impact of LLM-generated detector performance, promoting TPR@1%FPR with a possible trade-off of AUROC and accuracy.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"11 ","pages":"Article 100151"},"PeriodicalIF":0.0,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144069981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic detection of manipulated Bangla news: A new knowledge-driven approach","authors":"Aysha Akther, Kazi Masudul Alam, Rameswar Debnath","doi":"10.1016/j.nlp.2025.100155","DOIUrl":"10.1016/j.nlp.2025.100155","url":null,"abstract":"<div><div>In recent years, dissemination of misleading news has become easier than ever due to the simplicity of creating and distributing news content on online media platforms. Misleading news detection has become a global topic of interest due to its significant impact on society, economics, and politics. Automatic detection of the veracity of news remains challenging because of its diversity and close resemblance with true events. In many languages, fake news detection has been studied from different perspectives. However, in Bangla, existing endeavors on fake news detection generally relied on linguistic style analysis and latent representation-based machine learning and deep learning models. These models primarily rely on manually labeled annotations. To address these challenges, we proposed a knowledge-based Bangla fake news detection model that does not require model training. In our proposed manipulation detection approach, a news article is automatically labeled as fake or authentic based on an authenticity score that relies on the consistency of knowledge and semantics, underlying sentiment, and credibility of the news source. We also propose a consistent and context-aware manipulated news generation technique to facilitate the detection of partially manipulated Bangla news. We found the proposed model to be a reliable one for the detection of both fake news and partially manipulated news. We also developed a dataset that is balanced according to the number of authentic and fake news for the detection of Bangla fake news, where news items are collected from multiple domains and various news sources. The experimental evaluation of our proposed knowledge-driven approach on the developed dataset has shown 97.08% accuracy for only fake news detection.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"11 ","pages":"Article 100155"},"PeriodicalIF":0.0,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143942113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adapting language generation to dialogue environments and users for task-oriented dialogue systems","authors":"Atsumoto Ohashi, Ryuichiro Higashinaka","doi":"10.1016/j.nlp.2025.100153","DOIUrl":"10.1016/j.nlp.2025.100153","url":null,"abstract":"<div><div>When a natural language generation (NLG) component is implemented in a real-world task-oriented dialogue system, it is necessary to generate not only natural utterances as learned on training data but also utterances adapted to the dialogue environment (e.g., noise from environmental sounds) and the user (e.g., users with low levels of understanding ability). Inspired by recent advances in reinforcement learning (RL) for language generation tasks, we propose ANTOR, a method for <strong>A</strong>daptive <strong>N</strong>atural language generation for <strong>T</strong>ask-<strong>O</strong>riented dialogue via <strong>R</strong>einforcement learning. In ANTOR, a natural language understanding (NLU) module, which corresponds to the user’s understanding of system utterances, is incorporated into the objective function of RL. If the NLG’s intentions are correctly conveyed to the NLU, the NLG is given a positive reward. We conducted experiments on the two major task-oriented dialogue datasets, MultiWOZ and Schema-Guided Dialogue, and we confirmed that ANTOR could generate adaptive utterances against speech recognition errors and the different vocabulary levels of users. Further analysis revealed that ANTOR adapts to noisy environments and users with different vocabulary levels by prioritizing words that are less likely to cause speech recognition errors and by using words that match the user’s vocabulary level.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"11 ","pages":"Article 100153"},"PeriodicalIF":0.0,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143928216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tayyaba Hussain, Muhammad Usman Akram, Anum Abdul Salam
{"title":"A novel Data Extraction Framework Using Natural Language Processing (DEFNLP) techniques","authors":"Tayyaba Hussain, Muhammad Usman Akram, Anum Abdul Salam","doi":"10.1016/j.nlp.2025.100149","DOIUrl":"10.1016/j.nlp.2025.100149","url":null,"abstract":"<div><div>Evidence through data is critical if government has to address threats faced by the nation, such as pandemics or climate change. Yet several facts about data necessary to inform evidence and science are locked inside publications. We used scientific literature dataset, Coleridge Initiative — Show US the Data, to discover how the data can be used for the public good. In this research, we demonstrate a general Data Extraction Framework Using Natural Language Processing (DEFNLP) Techniques which challenge data scientists to show how publicly funded data has been used to serve science and society. The proposed framework uses NLP libraries and techniques like SpaCy and NER respectively and different huggingface Question Answering (QA) models to predict the datasets used in publications. DEFNLP findings can assist the government in immediate decisions making, accountability, transparent public investments, economic and public health benefits. Until now such an issue having large dataset which belongs to numerous research areas has not been addressed. This approach is domain independent and therefore can be applied to all kind of case studies and scenarios which require data extraction. Our methodology sets the state-of-the-art on Coleridge Initiative dataset, reaching the highest score of 0.554 using salti bert QA model with the less runtime i.e. 417.4 and output of 819 bytes than other QA models e.g., Longformer (runtime: 2710.2, output: 1780 bytes) and BigBird (runtime: 839.4, output: 177020 bytes) with 0.444 and 0.387 score respectively which impressively raised the leaderboard score with an outcome of 0.711. Its computation time to answer each query on CPU is far less i.e. 0.0696s (than 0.3556s and 0.8967s) and has suitable hyperparameters for our dataset as maximum answer length is 64, greater batch size as well as learning rate. In terms of timing and performance, each epoch took around 5 min on average on a computer with output size of 3.27kB which is again far better than other frameworks.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"11 ","pages":"Article 100149"},"PeriodicalIF":0.0,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143942112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian Q-learning in multi-objective reward model for homophobic and transphobic text classification in low-resource languages: A hypothesis testing framework in multi-objective setting","authors":"Vivek Suresh Raj , Ruba Priyadharshini , Saranya Rajiakodi , Bharathi Raja Chakravarthi","doi":"10.1016/j.nlp.2025.100152","DOIUrl":"10.1016/j.nlp.2025.100152","url":null,"abstract":"<div><div>Most Reinforcement Learning (RL) algorithms optimize a single-objective function, whereas real-world decision-making involves multiple aspects. For hate comment classification, an agent must balance maximizing the F1-score while minimizing False Positives (FP) to enhance precision and reduce misclassifications. However, such multi-objective optimization introduces uncertainties in decision-making. To address this, we propose a Bayesian Q-Learning framework with a convolutional neural network policy. The policy outputs action logits, integrated with Q-value estimates sampled via Thompson Sampling from a Gaussian posterior. Our reward function combines F1-score (objective 1) and a penalty for misclassification (objective 2) to optimize learning. To validate our framework, firstly we show that our framework classifies the hate-comments comparatively better than other baselines by scoring an F1-score of 83%, 93%, 77% and 71% in English-Tamil, English, Kannada and Malayalam datasets for detecting homophobic and transphobic comments respectively. Secondly, we demonstrate that the variance of Q-value estimates in our Bayesian posterior decreases significantly over time, indicating that the agent has learned an optimal policy that effectively balances the competing objectives. This finding is further supported by statistical t-tests conducted across all datasets, which confirm the significance of the observed variance reduction. Additionally, we observe our agent’s multi-objective optimization path in 3D space, which shows its ability to balance reward (F1-score) and regret. Furthermore, we compare the action selection between our Bayesian approach and non-Bayesian action clustering using K-Means algorithms, where our analysis highlights coherent clustering which indicates structure exploration, while non-Bayesian approach shows premature convergence to suboptimal policies.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"11 ","pages":"Article 100152"},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143924105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"It is all in the [MASK]: Simple instruction-tuning enables BERT-like masked language models as generative classifiers","authors":"Benjamin Clavié, Nathan Cooper, Benjamin Warner","doi":"10.1016/j.nlp.2025.100150","DOIUrl":"10.1016/j.nlp.2025.100150","url":null,"abstract":"<div><div>While encoder-only models such as BERT and ModernBERT are ubiquitous in real-world NLP applications, their conventional reliance on task-specific classification heads can limit their applicability compared to decoder-based large language models (LLMs). In this work, we introduce ModernBERT-Large-Instruct, a 0.4B-parameter encoder model that leverages its masked language modeling (MLM) head for generative classification. We design a simple approach, extracting all single-token answers from the FLAN dataset collection, and re-purposing standard MLM pre-training to only mask this single token answer. Our approach employs an intentionally simple training loop and inference mechanism that requires no heavy pre-processing, heavily engineered prompting, or architectural modifications. ModernBERT-Large-Instruct exhibits strong zero-shot performance on both classification and knowledge-based tasks, outperforming similarly sized LLMs on MMLU and achieving 93% of Llama3-1B’s MMLU performance with 60% less parameters. We also demonstrate that, when fine-tuned, the generative approach using the MLM head matches or even surpasses traditional classification-head methods across diverse NLU tasks. This capability emerges specifically in models trained on contemporary, diverse data mixes, with models trained on lower volume, less-diverse data yielding considerably weaker performance. Although preliminary, these results demonstrate the potential of using the original generative masked language modeling head over traditional task-specific heads for downstream tasks. Our work suggests that further exploration into this area is warranted, highlighting many avenues for future improvements.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"11 ","pages":"Article 100150"},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143937154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bag-of-Word approach is not dead: A performance analysis on a myriad of text classification challenges","authors":"Mario Graff , Daniela Moctezuma , Eric S. Téllez","doi":"10.1016/j.nlp.2025.100154","DOIUrl":"10.1016/j.nlp.2025.100154","url":null,"abstract":"<div><div>The Bag-of-Words (BoW) representation, enhanced with a classifier, was a pioneering approach to solving text classification problems. However, with the advent of transformers and, in general, deep learning architectures, the field has dynamically shifted its focus towards customizing these architectures for various natural language processing tasks, including text classification problems. For a newcomer, it might be impossible to realize that for some text classification problems, the traditional approach is still competitive. This research analyzes the competitiveness of BoW-based representations in different text-classification competitions run in English, Spanish, and Italian. To analyze the performance of these BoW-based representations, we participated in 12 text classification international competitions, summing up 24 tasks comprising five English tasks, seven in Italian, and twelve in Spanish. The results show that the proposed BoW representations have a difference of just 10% w.r.t. the competition winner and less than 2% in three tasks corresponding to author profiling. BoW outperforms BERT solutions and dominates in author profiling tasks.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"11 ","pages":"Article 100154"},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143924104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Financial sentiment analysis for pre-trained language models incorporating dictionary knowledge and neutral features","authors":"Yongyong Sun, Haiping Yuan, Fei Xu","doi":"10.1016/j.nlp.2025.100148","DOIUrl":"10.1016/j.nlp.2025.100148","url":null,"abstract":"<div><div>With increasing financial market complexity, accurate sentiment analysis of financial texts has become crucial. Traditional methods often misinterpret financial terminology and show high error rates in neutral sentiment recognition. This study aims to improve financial sentiment analysis accuracy through developing EnhancedFinSentiBERT, a model incorporating financial domain pre-training, dictionary knowledge embedding, and neutral feature extraction. Experiments on the FinancialPhraseBank, FiQA and Headline datasets demonstrate the model’s superior performance compared to mainstream methods, particularly in neutral sentiment recognition. Ablation analysis reveals that dictionary knowledge embedding and neutral feature extraction contribute most significantly to model improvement.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"11 ","pages":"Article 100148"},"PeriodicalIF":0.0,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"OVALYTICS: Enhancing Offensive Video Detection with YouTube Transcriptions and Advanced Language Models","authors":"Sneha Chinivar , Roopa M.S. , Arunalatha J.S. , Venugopal K.R.","doi":"10.1016/j.nlp.2025.100147","DOIUrl":"10.1016/j.nlp.2025.100147","url":null,"abstract":"<div><div>The exponential growth of offensive content online underscores the need for robust content moderation. In response, this work presents OVALYTICS (Offensive Video Analysis Leveraging YouTube Transcriptions with Intelligent Classification System), a comprehensive framework that introduces novel integrations of advanced technologies for offensive video detection. Unlike existing approaches, OVALYTICS uniquely combines Whisper AI for accurate audio-to-text transcription with state-of-the-art large language models (LLMs) such as BERT, ALBERT, XLM-R, MPNet, and T5 for semantic analysis. The framework also features a newly curated dataset tailored for fine-grained evaluation, achieving significant improvements in accuracy and F1-scores over traditional methods and advancing the state of automated content moderation.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"11 ","pages":"Article 100147"},"PeriodicalIF":0.0,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}