Natural Language Processing Journal最新文献

筛选
英文 中文
Enhancing grammatical documentation for endangered languages with graph-based meaning representation and Loopy Belief Propagation 利用基于图的意义表示和循环信念传播增强濒危语言的语法文档
Natural Language Processing Journal Pub Date : 2025-06-18 DOI: 10.1016/j.nlp.2025.100164
Sebastien Christian
{"title":"Enhancing grammatical documentation for endangered languages with graph-based meaning representation and Loopy Belief Propagation","authors":"Sebastien Christian","doi":"10.1016/j.nlp.2025.100164","DOIUrl":"10.1016/j.nlp.2025.100164","url":null,"abstract":"<div><div>DIG4EL (Digital Inferential Grammars for Endangered Languages) is a method embodied in software designed to assist linguists and teachers in producing grammatical descriptions for endangered languages. DIG4EL integrates linguistic knowledge from extensive databases such as WALS and Grambank with automated observations of controlled data collected using Conversational Questionnaires.</div><div>Linguistic knowledge and automated observations provide priors to a Bayesian network of grammatical parameters, where parameters are interconnected by directional conditional probability matrices derived from statistics on world languages. Inference of unknown parameter values is performed using Loopy Belief Propagation, achieving an average accuracy of 76% and a median accuracy of 85% in an experimental grammatical domain, determining the values of eight parameters related to canonical word order across 116 languages from diverse language families.</div><div>DIG4EL produces outputs either as structured files for computational use, Microsoft Word files, or plain-language grammatical descriptions generated by a Large Language Model. These descriptions rely solely on vetted data and observed examples, with prompts crafted explicitly to prevent external information or hallucinations.</div><div>By leveraging probabilistic modeling and rich, yet quickly assembled linguistic data, DIG4EL provides a powerful, accessible tool for creating grammatical descriptions and language teaching materials with minimal intervention from linguists. It significantly reduces the time and expertise required for traditional documentation workflows, ensuring endangered languages are better documented and taught.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"12 ","pages":"Article 100164"},"PeriodicalIF":0.0,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144329749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Next-generation image captioning: A survey of methodologies and emerging challenges from transformers to Multimodal Large Language Models 下一代图像字幕:从变形器到多模态大语言模型的方法和新挑战的调查
Natural Language Processing Journal Pub Date : 2025-06-10 DOI: 10.1016/j.nlp.2025.100159
Huda Diab Abdulgalil, Otman A. Basir
{"title":"Next-generation image captioning: A survey of methodologies and emerging challenges from transformers to Multimodal Large Language Models","authors":"Huda Diab Abdulgalil,&nbsp;Otman A. Basir","doi":"10.1016/j.nlp.2025.100159","DOIUrl":"10.1016/j.nlp.2025.100159","url":null,"abstract":"<div><div>The widespread availability of visual data on the Internet has fueled a significant interest in image-to-text captioning systems. Automated image captioning remains a challenging multimodal analytics task, integrating advances in both Computer Vision (CV) and Natural Language Processing (NLP) to understand image content and generate semantically meaningful textual descriptions. Modern deep learning-based approaches have supplanted traditional approaches in image captioning, leading to more efficient and sophisticated models. The development of attention mechanisms and transformer-based architectures has further enhanced the modeling of both language and visual data. Despite these gains, challenges such as long-tailed object recognition, bias in training data, and shortcomings in evaluation metrics constrain the capabilities of current models. Furthermore, an important breakthrough has been made with the recent emergence of Multimodal Large Language Models (MLLMs). By incorporating textual and visual data, MLLMs provide improved captioning flexibility, generative capabilities, and reasoning. However, these models introduce new challenges, including faithfulness, grounding, and computational cost. Although relatively few studies have comprehensively surveyed these developments, this paper provides a thorough analysis of Transformer-based captioning approaches, investigates the shift to MLLMs, and discusses associated challenges and opportunities. We also present a performance comparison of the latest models on the MS-COCO benchmark and conclude with perspectives on potential future research directions.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"12 ","pages":"Article 100159"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144313028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-lingual embedding methods and applications: A systematic review for low-resourced scenarios 跨语言嵌入方法和应用:低资源场景的系统回顾
Natural Language Processing Journal Pub Date : 2025-06-09 DOI: 10.1016/j.nlp.2025.100157
Thapelo Sindane , Vukosi Marivate , Abiodun Modupe
{"title":"Cross-lingual embedding methods and applications: A systematic review for low-resourced scenarios","authors":"Thapelo Sindane ,&nbsp;Vukosi Marivate ,&nbsp;Abiodun Modupe","doi":"10.1016/j.nlp.2025.100157","DOIUrl":"10.1016/j.nlp.2025.100157","url":null,"abstract":"<div><div>The field of Natural Language Processing (NLP) has achieved significant success in various areas, such as developing large-scale datasets, algorithmic complexity, optimized computing capabilities, refined individual and community expertise, and more, particularly in languages such as English, French, and Spanish. However, such global north unilateral strides have inadvertently created a substantial representation bias towards many languages categorized as low-resourced languages, with the majority being African languages. As a result, rudimentary resources such as stopwords, lemmatizers, stemmers, and word embeddings, as well as advanced multilingual transformer-based models remain under-developed for these languages. Compounding these circumstances is the lack of insights surrounding the development of these resources in the low-resourced context (e.g., how to develop embeddings for morphologically rich languages). Looking back, research priorities aiming to create these resources, largely motivated by the high cost attached to remedying these issues shifted, leading to the rise of alternative methods such as cross-lingual transfer learning (CLTL). CLTL involves transferring domain knowledge gained from supervised training to a domain with limited supervision signals. This study conducts a systematic literature review of CLTL techniques, in the context of cross-lingual models and embeddings, looking at their mathematical foundations, application domains, evaluation metrics, languages covered, and the latest developments. The findings of this study offer valuable insights into the present scenario of CLTL techniques, identifying areas for future research and development to advance cross-lingual natural language processing applications specifically in low-resourced settings.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"12 ","pages":"Article 100157"},"PeriodicalIF":0.0,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144298569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Intermediate-task transfer learning for Indonesian NLP tasks 印尼语NLP任务的中间任务迁移学习
Natural Language Processing Journal Pub Date : 2025-06-03 DOI: 10.1016/j.nlp.2025.100161
Adrianus Saga Ekakristi, Alfan Farizki Wicaksono, Rahmad Mahendra
{"title":"Intermediate-task transfer learning for Indonesian NLP tasks","authors":"Adrianus Saga Ekakristi,&nbsp;Alfan Farizki Wicaksono,&nbsp;Rahmad Mahendra","doi":"10.1016/j.nlp.2025.100161","DOIUrl":"10.1016/j.nlp.2025.100161","url":null,"abstract":"<div><div>Transfer learning, a common technique in recent Natural Language Processing (NLP) research, involves pre-training a model on a large, unlabeled dataset using self-supervised methods and then fine-tuning it on a smaller, labeled dataset for a specific task. Recent studies have demonstrated that introducing an additional training step between pre-training and fine-tuning can further enhance model performance. This method is called intermediate-task transfer learning (ITTL). Although this approach can potentially improve performance in the target task, choosing an intermediate task that leads to the highest performance increase remains challenging. Furthermore, despite the extensive research on intermediate training methods in English NLP, the application of these techniques to Indonesian language processing is still relatively understudied. In this study, we apply the ITTL method to nine Indonesian NLP datasets, using each as both intermediate and target tasks, to investigate its behavior. Furthermore, we show that linear regression analysis can effectively identify factors that maximize performance improvement in target tasks when using ITTL. Our experiments reveal that ITTL enhances F1 score performance in the majority of cases, provided suitable intermediate tasks are selected. Specifically, our best performing model achieved performance gains in 9 out of 10 target task sets, with improvements reaching up to 18.6%. Our detailed analysis indicates that factors such as task type matching, task complexity, vocabulary size, and dataset size significantly influence the effectiveness of ITTL on target task performance. This research shows that ITTL, coupled with our proposed guidelines for intermediate task selection, offers a promising training paradigm for Indonesian NLP.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"12 ","pages":"Article 100161"},"PeriodicalIF":0.0,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144291182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advances in machine transliteration methods, limitations, challenges, applications and future directions 机器音译方法的进展、限制、挑战、应用和未来方向
Natural Language Processing Journal Pub Date : 2025-06-01 DOI: 10.1016/j.nlp.2025.100158
A’la Syauqi , Aji Prasetya Wibawa
{"title":"Advances in machine transliteration methods, limitations, challenges, applications and future directions","authors":"A’la Syauqi ,&nbsp;Aji Prasetya Wibawa","doi":"10.1016/j.nlp.2025.100158","DOIUrl":"10.1016/j.nlp.2025.100158","url":null,"abstract":"<div><div>Machine transliteration is critical in natural language processing (NLP), facilitating script conversion while preserving phonetic integrity across diverse languages. Using the PRISMA framework, this review analyzes 73 selected studies on machine transliteration, covering both methodological advancements and its role in NLP applications. Among these, 37 studies focus on transliteration methods (rule-based, statistical, machine learning, hybrid, and semantic), while 32 studies explore their application in NLP tasks such as machine translation, sentiment analysis, and text normalization. Rule-based methods provide structured frameworks but face challenges in adapting to linguistic variability. Statistical techniques demonstrate robustness yet depend heavily on the availability of parallel corpora. Machine learning models leverage neural architectures to achieve high accuracy but are constrained by data scarcity for low-resource languages. Hybrid approaches integrate multiple methodologies, while semantic knowledge-based models enhance accuracy by incorporating linguistic features. The review highlights transliteration’s role in NLP applications such as machine translation, sentiment analysis, and text normalization, which are critical for improving multilingual language accessibility. Findings show that machine learning-based approaches dominate transliteration research (32 of 73 studies), followed by rule-based and hybrid methods. These approaches contribute to improving multilingual accessibility and NLP performance. This study provides actionable insights for researchers and practitioners by synthesizing advancements and identifying challenges. These insights enable the development more efficient and inclusive transliteration systems, ultimately supporting linguistic diversity and advancing multilingual NLP technologies. The review identifies gaps in addressing underrepresented languages like Javanese, where complex character sets, orthographic rules, and scriptio continua remain underexplored.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"11 ","pages":"Article 100158"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144230930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multi-way parallel named entity annotated corpus for English, Tamil and Sinhala 英语、泰米尔语和僧伽罗语的多路并行命名实体注释语料库
Natural Language Processing Journal Pub Date : 2025-06-01 DOI: 10.1016/j.nlp.2025.100160
Surangika Ranathunga , Asanka Ranasinghe , Janaka Shamal , Ayodya Dandeniya , Rashmi Galappaththi , Malithi Samaraweera
{"title":"A multi-way parallel named entity annotated corpus for English, Tamil and Sinhala","authors":"Surangika Ranathunga ,&nbsp;Asanka Ranasinghe ,&nbsp;Janaka Shamal ,&nbsp;Ayodya Dandeniya ,&nbsp;Rashmi Galappaththi ,&nbsp;Malithi Samaraweera","doi":"10.1016/j.nlp.2025.100160","DOIUrl":"10.1016/j.nlp.2025.100160","url":null,"abstract":"<div><div>This paper presents a multi-way parallel English-Tamil-Sinhala corpus annotated with Named Entities (NEs), where Sinhala and Tamil are low-resource languages. Using pre-trained multilingual Language Models (mLMs), we establish new benchmark Named Entity Recognition (NER) results on this dataset for Sinhala and Tamil. We also carry out a detailed investigation on the NER capabilities of different types of LMs. Finally, we demonstrate the utility of our NER system on a low-resource Neural Machine Translation (NMT) task. Our dataset is publicly released: <span><span>https://github.com/suralk/multiNER</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"11 ","pages":"Article 100160"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144230939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Categorizing Mental Stress: A Consistency-Focused Benchmarking of ML and DL Models for Multi-Label, Multi-Class Classification via Taxonomy-Driven NLP Techniques 对精神压力进行分类:通过分类驱动的NLP技术对多标签、多类分类的ML和DL模型进行一致性基准测试
Natural Language Processing Journal Pub Date : 2025-06-01 DOI: 10.1016/j.nlp.2025.100162
Juswin Sajan John , Boppuru Rudra Prathap , Gyanesh Gupta , Jaivanth Melanaturu
{"title":"Categorizing Mental Stress: A Consistency-Focused Benchmarking of ML and DL Models for Multi-Label, Multi-Class Classification via Taxonomy-Driven NLP Techniques","authors":"Juswin Sajan John ,&nbsp;Boppuru Rudra Prathap ,&nbsp;Gyanesh Gupta ,&nbsp;Jaivanth Melanaturu","doi":"10.1016/j.nlp.2025.100162","DOIUrl":"10.1016/j.nlp.2025.100162","url":null,"abstract":"<div><div>Mental stress, a critical concern worldwide, necessitates precise and nuanced characterization. This study introduces a novel approach to effectively characterize mental stress through a multi-label, multi-class classification framework through natural language processing techniques. Building on existing literature, discussions with psychologists and other mental health practitioners, we developed a taxonomy of 27 distinctive markers spread across 4 label categories; aiming to create a preliminary screening tool leveraging textual data.</div><div>The core objective is to identify the most suitable model for this complex task, encompassing comprehensive evaluation of various machine learning and deep learning algorithms. we experimented with support vector machines (SVM), random forest (RF) and long short-term memory (LSTM) algorithms incorporating various feature combinations involving Term Frequency-Inverse Document Frequency (TF-IDF) and Latent Dirichlet Allocation (LDA). The best performer of this comparative study was further evaluated against an LLM.</div><div>The potential of large language models (LLMs), including their language understanding and prediction capabilities, is another key focus. We explore how these models could augment and advance mental health research, offering new perspectives and insights into the characterization of mental stress.</div><div>Our findings show that the top model, an LSTM with TF-IDF and LDA (class weights assigned) outperformed the PaLM model with a coefficient of variation as low as 0.87% across all labels. Despite the PaLM model’s superior average performance, it exhibited higher variability among different labels.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"11 ","pages":"Article 100162"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144242689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantifying extreme opinions on Reddit amidst the 2023 Israeli–Palestinian conflict 量化2023年巴以冲突中Reddit上的极端观点
Natural Language Processing Journal Pub Date : 2025-05-27 DOI: 10.1016/j.nlp.2025.100156
Alessio Guerra , Marcello Lepre , Oktay Karakuş
{"title":"Quantifying extreme opinions on Reddit amidst the 2023 Israeli–Palestinian conflict","authors":"Alessio Guerra ,&nbsp;Marcello Lepre ,&nbsp;Oktay Karakuş","doi":"10.1016/j.nlp.2025.100156","DOIUrl":"10.1016/j.nlp.2025.100156","url":null,"abstract":"<div><div>This study investigates the dynamics of extreme opinions on social media during the 2023 Israeli–Palestinian conflict, utilising a comprehensive dataset of over 450,000 posts from four Reddit subreddits (<em>r/Palestine</em>, <em>r/Judaism</em>, <em>r/IsraelPalestine</em>, and <em>r/worldnews</em>). A lexicon-based, unsupervised methodology was developed to measure “extreme opinions” by considering factors such as anger, polarity, and subjectivity. The analysis identifies significant peaks in extremism scores that correspond to pivotal real-life events, such as the IDF’s bombings of Al Quds Hospital and the Jabalia Refugee Camp, and the end of a ceasefire following a terrorist attack. Additionally, this study explores the distribution and correlation of these scores across different subreddits and over time, providing insights into the propagation of polarised sentiments in response to conflict events. By examining the quantitative effects of each score on extremism and analysing word cloud similarities through Jaccard indices, the research offers a nuanced understanding of the factors driving extreme online opinions. Our findings show that posts exhibiting extreme sentiment surged up to 80% (an increase of 0.3 in extremism score above the average of 0.405 at the end of October) during key conflict events. Compared to recent studies that have not explicitly quantified extremism in an unsupervised manner, we contribute to the literature by addressing this gap through a novel extremism score, derived from sentiment polarity, anger, and subjectivity, to analyse Reddit discourse surrounding the 2023 Israel–Palestine conflict. This approach captures the complex interplay between real-world events and online reactions, while acknowledging the inherent challenges of measuring extremism in dynamic social media environments. Our approach also enables scalable monitoring of public sentiment extremity, providing valuable insights for policymakers and conflict researchers.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"11 ","pages":"Article 100156"},"PeriodicalIF":0.0,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144147094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding the effects of human-written paraphrases in LLM-generated text detection 理解在法学硕士生成的文本检测中人类写的释义的影响
Natural Language Processing Journal Pub Date : 2025-05-15 DOI: 10.1016/j.nlp.2025.100151
Hiu Ting Lau, Arkaitz Zubiaga
{"title":"Understanding the effects of human-written paraphrases in LLM-generated text detection","authors":"Hiu Ting Lau,&nbsp;Arkaitz Zubiaga","doi":"10.1016/j.nlp.2025.100151","DOIUrl":"10.1016/j.nlp.2025.100151","url":null,"abstract":"<div><div>Natural Language Generation has been rapidly developing with the advent of large language models (LLMs). While their usage has sparked significant attention from the general public, it is important for readers to be aware when a piece of text is LLM-generated. This has brought about the need for building models that enable automated LLM-generated text detection, with the aim of mitigating potential negative outcomes of such content. Existing LLM-generated detectors show competitive performances in telling apart LLM-generated and human-written text, but this performance is likely to deteriorate when paraphrased texts are considered. In this study, we devise a new data collection strategy to collect Human &amp; LLM Paraphrase Collection (HLPC), a first-of-its-kind dataset that incorporates human-written texts and paraphrases, as well as LLM-generated texts and paraphrases. With the aim of understanding the effects of human-written paraphrases on the performance of SOTA LLM-generated text detectors OpenAI RoBERTa and watermark detectors, we perform classification experiments that incorporate human-written paraphrases, watermarked and non-watermarked LLM-generated documents from GPT and OPT, and LLM-generated paraphrases from DIPPER and BART. The results show that the inclusion of human-written paraphrases has a significant impact of LLM-generated detector performance, promoting TPR@1%FPR with a possible trade-off of AUROC and accuracy.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"11 ","pages":"Article 100151"},"PeriodicalIF":0.0,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144069981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic detection of manipulated Bangla news: A new knowledge-driven approach 自动检测被操纵的孟加拉新闻:一种新的知识驱动的方法
Natural Language Processing Journal Pub Date : 2025-05-12 DOI: 10.1016/j.nlp.2025.100155
Aysha Akther, Kazi Masudul Alam, Rameswar Debnath
{"title":"Automatic detection of manipulated Bangla news: A new knowledge-driven approach","authors":"Aysha Akther,&nbsp;Kazi Masudul Alam,&nbsp;Rameswar Debnath","doi":"10.1016/j.nlp.2025.100155","DOIUrl":"10.1016/j.nlp.2025.100155","url":null,"abstract":"<div><div>In recent years, dissemination of misleading news has become easier than ever due to the simplicity of creating and distributing news content on online media platforms. Misleading news detection has become a global topic of interest due to its significant impact on society, economics, and politics. Automatic detection of the veracity of news remains challenging because of its diversity and close resemblance with true events. In many languages, fake news detection has been studied from different perspectives. However, in Bangla, existing endeavors on fake news detection generally relied on linguistic style analysis and latent representation-based machine learning and deep learning models. These models primarily rely on manually labeled annotations. To address these challenges, we proposed a knowledge-based Bangla fake news detection model that does not require model training. In our proposed manipulation detection approach, a news article is automatically labeled as fake or authentic based on an authenticity score that relies on the consistency of knowledge and semantics, underlying sentiment, and credibility of the news source. We also propose a consistent and context-aware manipulated news generation technique to facilitate the detection of partially manipulated Bangla news. We found the proposed model to be a reliable one for the detection of both fake news and partially manipulated news. We also developed a dataset that is balanced according to the number of authentic and fake news for the detection of Bangla fake news, where news items are collected from multiple domains and various news sources. The experimental evaluation of our proposed knowledge-driven approach on the developed dataset has shown 97.08% accuracy for only fake news detection.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"11 ","pages":"Article 100155"},"PeriodicalIF":0.0,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143942113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信