{"title":"Fine-tuning text-to-SQL models with reinforcement-learning training objectives","authors":"Xuan-Bang Nguyen , Xuan-Hieu Phan , Massimo Piccardi","doi":"10.1016/j.nlp.2025.100135","DOIUrl":"10.1016/j.nlp.2025.100135","url":null,"abstract":"<div><div>Text-to-SQL is an important natural language processing task that helps users automatically convert natural language queries into formal SQL code. While transformer-based models have pushed text-to-SQL to unprecedented accuracy levels in recent years, such performance is confined to models of very large size that can only be run in specialised clouds. For this reason, in this paper we explore the use of reinforcement learning to improve the performance of models of more conservative size, which can fit within standard user hardware. As reinforcement learning reward, we propose a novel function which better aligns with the text-to-SQL evaluation metrics, applied in conjunction with two strong policy gradient algorithms, REINFORCE and RELAX. Our experimental results over the popular Spider benchmark show that the proposed approach has been able to outperform a conventionally-trained T5 Small baseline by 6.6 pp (percentage points) of exact-set-match accuracy and 4.6 pp of execution accuracy, and a T5 Base baseline by 2.0 pp and 1.9 pp, respectively. The proposed model has also achieved a remarkable comparative performance against ChatGPT instances.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"10 ","pages":"Article 100135"},"PeriodicalIF":0.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143510043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AI Linguistics","authors":"Guosheng Zhang","doi":"10.1016/j.nlp.2025.100137","DOIUrl":"10.1016/j.nlp.2025.100137","url":null,"abstract":"<div><div>This research investigates the development of a linguistics for artificial intelligence (AI) to demystify the ”black box” of AI. At its core, the language of AI is Embedding—a novel high-dimensional, intelligent language. Embedding exhibits dual characteristics: it operates both as a semantic domain and as a mathematical point. This duality enables Embedding to maintain the discrete, symbolic nature of human languages while facilitating continuous operations in high-dimensional spaces, unlocking significant potential for advanced intelligence. A series of specialized experiments were designed to explore Embedding’s intrinsic properties, including its behavior as a semantic cloud in high-dimensional space, its degrees of freedom, and spatial transformations. Key findings include the discovery of substantial redundant dimensions in embeddings, confirmation that embeddings lack critical dimensions, and the measurement of engineering dimensions in natural language. This research also establishes the linguistic foundations and application limits of techniques such as dropout strategies, AI model distillation, and scaling laws among others. Building on these insights, we propose innovative solutions across several fields, including AI architecture design, AI reasoning, domain-based embedding search, and the construction of a multi-intelligence spectrum for embeddings. Ultimately, we introduce a foundational methodology for embedding everything from real-world into the AI world, providing a comprehensive reference framework for the evolution of artificial general intelligence (AGI) and artificial superintelligence (ASI). Additionally, this research explores linguistic approaches to the co-evolution of human intelligence and artificial intelligence.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"10 ","pages":"Article 100137"},"PeriodicalIF":0.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143520847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Aspect-based sentiment classification with BERT and AI feedback","authors":"Lingling Xu, Weiming Wang","doi":"10.1016/j.nlp.2025.100136","DOIUrl":"10.1016/j.nlp.2025.100136","url":null,"abstract":"<div><div>Data augmentation has been widely employed in low-resource aspect-based sentiment classification (ABSC) tasks to alleviate the issue of data sparsity and enhance the performance of the model. Unlike previous data augmentation approaches that rely on back translation, synonym replacement, or generative language models such as T5, the generation power of large language models is explored rarely. Large language models like GPT-3.5-turbo are trained on extensive datasets and corpus to capture semantic and contextual relationships between words and sentences. To this end, we propose Masked Aspect Term Prediction (MATP), a novel data augmentation method that utilizes the world knowledge and powerful generative capacity of large language models to generate new aspect terms via word masking. By incorporating AI feedback from large language models, MATP increases the diversity and richness of aspect terms. Experimental results on the ABSC datasets with BERT as the backbone model show that the introduction of new augmented datasets leads to significant improvements over baseline models, validating the effectiveness of the proposed data augmentation strategy that combines AI feedback.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"10 ","pages":"Article 100136"},"PeriodicalIF":0.0,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143474949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A transformer based multi task learning approach to multimodal hate speech detection","authors":"Prashant Kapil , Asif Ekbal","doi":"10.1016/j.nlp.2025.100133","DOIUrl":"10.1016/j.nlp.2025.100133","url":null,"abstract":"<div><div>Online hate speech has become a major social issue in recent years, affecting both individuals and society as a whole. Memes are a multimodal kind of internet hate speech that is growing more common. Online memes are often entertaining and harmless. The seemingly innocent meme, on the other hand, transforms into a multimodal form of hate speech—a hateful meme—when specific types of text, graphics, or combinations of both are used. The spread of these harmful or undesirable memes has the potential to disrupt societal peace. Therefore, it is vital to limit inappropriate memes on social media. Multimodal hate speech identification is an inherently difficult and open question. It necessitates collaborative language, visual perception, and multimodal reasoning. This line of research has been progressed in this work by building a multi-task learning-based multimodal system for detecting hateful memes by training four hateful meme data sets concurrently. This MTL framework, which consists of Contrastive Language Image Pretraining (CLIP), UNiversal Image-TExt Representation Learning (UNITER), and BERT, was trained collaboratively to transfer common knowledge while simultaneously training four meme datasets. The results show that the recommended strategy outperforms unimodal and multimodal approaches on four multilingual benchmark datasets, with considerable AUC-ROC, accuracy, and F1-score. The ablation studies are undertaken to emphasise the impact of the sub-component in the MTL model. The confusion matrix is shown as quantitative analysis.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"11 ","pages":"Article 100133"},"PeriodicalIF":0.0,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143760774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Ali Dadgostarnia , Ramin Mousa , Saba Hesaraki , Mahdi Hemmasian
{"title":"CapsF: Capsule Fusion for Extracting psychiatric stressors for suicide from Twitter","authors":"Mohammad Ali Dadgostarnia , Ramin Mousa , Saba Hesaraki , Mahdi Hemmasian","doi":"10.1016/j.nlp.2025.100134","DOIUrl":"10.1016/j.nlp.2025.100134","url":null,"abstract":"<div><div>Along with factors such as cancer, blood pressure, street accidents and stroke, suicide has been one of Iran’s main causes of death. One of the main reasons for suicide is psychological stressors. Identifying psychological stressors in an at-risk population can help in the early prevention of suicidal and suicidal behaviours. In recent years, the widespread popularity and flow of real-time information sharing of social media have allowed for potential early intervention in large-scale and even small-scale populations. However, some automated approaches to extract psychiatric stressors from Twitter have been presented, but most of this research has been for non-Persian languages. This study aims to investigate the techniques of detecting psychiatric stress related to suicide from Persian tweets using learning-based methods. The proposed capsule-based approach achieved a binary classification accuracy of 0.83.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"10 ","pages":"Article 100134"},"PeriodicalIF":0.0,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143454121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tweet question classification for enhancing Tweet Question Answering System","authors":"Chindukuri Mallikarjuna, Sangeetha Sivanesan","doi":"10.1016/j.nlp.2025.100130","DOIUrl":"10.1016/j.nlp.2025.100130","url":null,"abstract":"<div><div>In the evolving landscape of social media, effective Question Answering (QA) systems are crucial for enhancing user engagement and satisfaction. Question classification (QC) is vital for improving the efficiency and accuracy of QA systems. Given the informal and noisy nature of social media texts, which differ significantly from general domain QC datasets, there is a strong need for a specialized tweet QC system for social media QA. In this study, we annotated questions in the Tweet QA dataset, performed tweet question classification, and developed the TweetQC dataset, comprising tweet questions with associated labels. We fine-tuned both general and domain-specific pre-trained language models (PTLMs) on the tweet questions. Experimental results show that TweetRoBERTa achieves the highest F1-score of 91.98, outperforming other PTLMs. Additionally, PTLMs trained on the TREC dataset and evaluated on the TweetQC dataset exhibited an accuracy decline of over 35% compared to models trained and evaluated on the TweetQC dataset. Furthermore, incorporating the expected answer type as an additional feature significantly enhances the performance of tweet QA models. Experimental results proves that TweetRoBERTa achieved the maximum ROUGEL score when compared with existing models for Tweet QA system.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"10 ","pages":"Article 100130"},"PeriodicalIF":0.0,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143487687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nor Saiful Azam Bin Nor Azmi , Michal Ptaszynski , Fumito Masui , Juuso Eronen , Karol Nowakowski
{"title":"Token and part-of-speech fusion for pretraining of transformers with application in automatic cyberbullying detection","authors":"Nor Saiful Azam Bin Nor Azmi , Michal Ptaszynski , Fumito Masui , Juuso Eronen , Karol Nowakowski","doi":"10.1016/j.nlp.2025.100132","DOIUrl":"10.1016/j.nlp.2025.100132","url":null,"abstract":"<div><div>Cyberbullying detection remains a significant challenge in the context of expanding internet and social media usage. This study proposes a novel pretraining methodology for transformer models, integrating Part-of-Speech (POS) information with a unique way of tokenization. The proposed model, based on the ELECTRA architecture, undergoes pretraining and fine-tuning and is referred to as ELECTRA_POS. By leveraging linguistic structures, this approach improves understanding of context and subtle meaning in the text. Through evaluation using the GLUE benchmark and a dedicated cyberbullying detection dataset, ELECTRA_POS consistently delivers enhanced performance compared to conventional transformer models. Key contributions include the introduction of POS-token fusion techniques and their application to improve cyberbullying detection, as well as insights into how linguistic features influence transformer-based models. The result highlights how integrating POS information into the transformer model improves the detection of harmful online behavior while benefiting other natural language processing (NLP) tasks.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"10 ","pages":"Article 100132"},"PeriodicalIF":0.0,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143386492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comparative analysis of encoder only and decoder only models for challenging LLM-generated STEM MCQs using a self-evaluation approach","authors":"Ghada Soliman Ph.D. , Hozaifa Zaki , Mohamed Kilany","doi":"10.1016/j.nlp.2025.100131","DOIUrl":"10.1016/j.nlp.2025.100131","url":null,"abstract":"<div><div>Large Language Models (LLMs) have demonstrated impressive capabilities in various tasks, including Multiple-Choice Question Answering (MCQA) evaluated on benchmark datasets with few-shot prompting. Given the absence of benchmark Science, Technology, Engineering, and Mathematics (STEM) datasets on Multiple-Choice Questions (MCQs) created by LLMs, we employed various LLMs (e.g., Vicuna-13B, Bard, and GPT-3.5) to generate MCQs on STEM topics curated from Wikipedia. We evaluated open-source LLM models such as Llama 2-7B and Mistral-7B Instruct, along with an encoder model such as DeBERTa v3 Large, on inference by adding context in addition to fine-tuning with and without context. The results showed that DeBERTa v3 Large and Mistral-7B Instruct outperform Llama 2-7B, highlighting the potential of LLMs with fewer parameters in answering hard MCQs when given the appropriate context through fine-tuning. We also benchmarked the results of these models against closed-source models such as Gemini and GPT-4 on inference with context, showcasing the potential of narrowing the gap between open-source and closed-source models when context is provided. Our work demonstrates the capabilities of LLMs in creating more challenging tasks that can be used as self-evaluation for other models. It also contributes to understanding LLMs’ capabilities in STEM MCQs tasks and emphasizes the importance of context for LLMs with fewer parameters in enhancing their performance.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"10 ","pages":"Article 100131"},"PeriodicalIF":0.0,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143420235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A survey on chatbots and large language models: Testing and evaluation techniques","authors":"Sonali Uttam Singh, Akbar Siami Namin","doi":"10.1016/j.nlp.2025.100128","DOIUrl":"10.1016/j.nlp.2025.100128","url":null,"abstract":"<div><div>Chatbots have been quite developed in the recent decades and evolved along with the field of Artificial Intelligence (AI), enabling powerful capabilities in tasks such as text generation and summarization, sentiment analysis, and many other interesting Natural Language Processing (NLP) based tasks. Advancements in language models (LMs), specifically LLMs, have played an important role in improving the capabilities of chatbots. This survey paper provides a comprehensive overview in chatbot with the integration of LLMs, primarily focusing on the testing, evaluation and performance techniques and frameworks associated with it. The paper discusses the foundational concepts of chatbots and their evolution, highlights the challenges and opportunities they present by reviewing the state-of-the-art papers associated with the chatbots design, testing and evaluation. The survey also delves into the key components of chatbot systems, including Natural Language Understanding (NLU), dialogue management, and Natural Language Generation (NLG), and examine how LLMs have influenced each of these components. Furthermore, the survey examines the ethical considerations and limitations associated with LLMs. The paper primarily focuses on investigating the evaluation techniques and metrics used to assess the performance and effectiveness of these language models. This paper aims to provide an overview of chatbots and highlights the need for an appropriate framework in regards to testing and evaluating these chatbots and the LLMs associated with it in order to provide efficient and proper knowledge to user and potentially improve its quality based on advancements in the field of machine learning.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"10 ","pages":"Article 100128"},"PeriodicalIF":0.0,"publicationDate":"2025-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143139778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emil Rijcken , Kalliopi Zervanou , Pablo Mosteiro , Floortje Scheepers , Marco Spruit , Uzay Kaymak
{"title":"Machine learning vs. rule-based methods for document classification of electronic health records within mental health care—A systematic literature review","authors":"Emil Rijcken , Kalliopi Zervanou , Pablo Mosteiro , Floortje Scheepers , Marco Spruit , Uzay Kaymak","doi":"10.1016/j.nlp.2025.100129","DOIUrl":"10.1016/j.nlp.2025.100129","url":null,"abstract":"<div><div>Document classification is a widely used task for analyzing mental healthcare texts. This systematic literature review focuses on the document classification of electronic health records in mental healthcare. Over the last decade, there has been a shift from rule-based to machine-learning methods. Despite this shift, no systematic comparison of these two approaches exists for mental healthcare applications. This review examines the evolution, applications, and performance of these methods over time. We find that for most of the last decade, rule-based methods have outperformed machine-learning approaches. However, with the development of more advanced machine-learning techniques, performance has improved. In particular, Transformer-based models enable machine learning approaches to outperform rule-based methods for the first time.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"10 ","pages":"Article 100129"},"PeriodicalIF":0.0,"publicationDate":"2025-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143139777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}