Natural Language Processing Journal最新文献

筛选
英文 中文
Whose morality do they speak? Unraveling cultural bias in multilingual language models 他们说的是谁的道德?揭示多语言模型中的文化偏见
Natural Language Processing Journal Pub Date : 2025-06-30 DOI: 10.1016/j.nlp.2025.100172
Meltem Aksoy
{"title":"Whose morality do they speak? Unraveling cultural bias in multilingual language models","authors":"Meltem Aksoy","doi":"10.1016/j.nlp.2025.100172","DOIUrl":"10.1016/j.nlp.2025.100172","url":null,"abstract":"<div><div>Large language models (LLMs) have become integral tools in diverse domains, yet their moral reasoning capabilities across cultural and linguistic contexts remain underexplored. This study investigates whether multilingual LLMs, such as GPT-3.5-Turbo, GPT-4o-mini, Llama 3.1, and MistralNeMo, reflect culturally specific moral values or impose dominant moral norms, particularly those rooted in English. Using the updated Moral Foundations Questionnaire (MFQ-2) in eight languages, Arabic, Farsi, English, Spanish, Japanese, Chinese, French, and Russian, the study analyzes the models’ adherence to six core moral foundations: care, equality, proportionality, loyalty, authority, and purity. The results reveal significant cultural and linguistic variability, challenging the assumption of universal moral consistency in LLMs. Although some models demonstrate adaptability to diverse contexts, others exhibit biases influenced by the composition of the training data. These findings underscore the need for culturally inclusive model development to improve fairness and trust in multi-lingual AI systems.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"12 ","pages":"Article 100172"},"PeriodicalIF":0.0,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144534372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A hybrid BERT-BiRNN framework for mental health prediction using textual data 基于文本数据的心理健康预测混合BERT-BiRNN框架
Natural Language Processing Journal Pub Date : 2025-06-26 DOI: 10.1016/j.nlp.2025.100165
Muhammad Nouman , Sui Yang Khoo , M.A. Parvez Mahmud , Abbas Z. Kouzani
{"title":"A hybrid BERT-BiRNN framework for mental health prediction using textual data","authors":"Muhammad Nouman ,&nbsp;Sui Yang Khoo ,&nbsp;M.A. Parvez Mahmud ,&nbsp;Abbas Z. Kouzani","doi":"10.1016/j.nlp.2025.100165","DOIUrl":"10.1016/j.nlp.2025.100165","url":null,"abstract":"<div><div>Effective mental health prediction requires training of artificial intelligence algorithms on relevant datasets obtained from individuals suffering from mental illnesses. This study employs a labelled​ text dataset derived from <em>Lyf Support app</em>. To harness the potential of this dataset for the development of a mental health prediction tool, we propose a novel technique that utilises the bidirectional encoder representations from transformers (BERT) model to identify mental health-related text chats. This technique enables effective and accurate identification of textual content relevant to mental health, facilitating the creation of an advanced prediction model. It is capable of extracting word embeddings retaining the semantic and contextual meaning of words. Then, the bidirectional long-short-term memory (BiLSTM) and bidirectional gated recurrent unit (BiGRU) models are employed as a sequence processing classifier to effectively analyse and detect signs of mental illness from text chats. Extensive experiments are conducted, and the results are compared against the state-of-the-art models, suggesting that our method outperforms the others by achieving 92.4% accuracy. Overall, this study establishes a good foundation for future research endeavours in mental health prediction approaches. The methodologies and findings presented herein pave the way for further advancements and innovations in this field of study.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"12 ","pages":"Article 100165"},"PeriodicalIF":0.0,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144522057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A survey and evaluation of text-to-speech systems for the Tamil language 泰米尔语文本转语音系统的调查与评价
Natural Language Processing Journal Pub Date : 2025-06-25 DOI: 10.1016/j.nlp.2025.100171
Ahrane Mahaganapathy, Kengatharaiyer Sarveswaran
{"title":"A survey and evaluation of text-to-speech systems for the Tamil language","authors":"Ahrane Mahaganapathy,&nbsp;Kengatharaiyer Sarveswaran","doi":"10.1016/j.nlp.2025.100171","DOIUrl":"10.1016/j.nlp.2025.100171","url":null,"abstract":"<div><div>This survey provides a comprehensive review of existing Tamil Text-to-Speech (TTS) synthesis systems, synthesis approaches, evaluation approaches, and highlights state-of-the-art approaches and challenges in handling linguistic nuances. Voice-based interfaces are becoming part of life. Therefore, it is import to have an expensive TTS system which can make human experience better. Tamil, with its rich linguistic features and diagnostic nature, presents significant challenges to speech synthesis. In addition to the survey, importantly this work proposes a perceptual evaluation framework which consists of expressiveness, low listening fatigue, and overall quality, in addition to traditional intelligibility and naturalness, dimensions to evaluate better human experience. This study also uses the Comparative Mean Opinion Score (CMOS) for the subjective evaluation instead of the Mean Opinion Score. A dataset for the evaluation was also carefully prepared and six widely used Tamil TTS systems were evaluated using Word Error Rate and the subjective evaluation was done using the proposed evaluation framework with the support of 30 evaluators. The reliability of the subjective evaluation is also assessed using Krippendorff’s Alpha. The results indicate the existing systems have significant room for improvement in all perceptual dimensions. The study underscores the need for evaluation datasets and evaluation approaches that cater to subjective perceptual dimensions of speech synthesis for better human experience and lays a foundation for future research and development in Tamil and similar TTS systems.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"12 ","pages":"Article 100171"},"PeriodicalIF":0.0,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144549312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving multilabel text emotion detection with emotion interrelation anchors 利用情感关联锚点改进多标签文本情感检测
Natural Language Processing Journal Pub Date : 2025-06-25 DOI: 10.1016/j.nlp.2025.100170
Polydoros Giannouris , Vasileios Mygdalis , Ioannis Pitas
{"title":"Improving multilabel text emotion detection with emotion interrelation anchors","authors":"Polydoros Giannouris ,&nbsp;Vasileios Mygdalis ,&nbsp;Ioannis Pitas","doi":"10.1016/j.nlp.2025.100170","DOIUrl":"10.1016/j.nlp.2025.100170","url":null,"abstract":"<div><div>Emotion detection studies the problem of automatic identification of emotions expressed in text. Since multiple emotions may co-occur in a single text excerpt, state-of-the-art approaches often cast this multi-label classification task to multiple, independent binary classification tasks, each specialized for one emotion class. The main disadvantage of such approaches is that, by design, each binary classifier overlooks typical emotion interrelationships, such as co-occurrence (e.g., anger and fear) or mutual exclusiveness (e.g., sadness and joy). This paper proposes a simple and lightweight approach to re-introduce emotion interrelations into each binary classification task, where each binary classifier is able to understand the presence of other emotions, without directly inferring them. This is achieved by incorporating the proposed emotion anchors (i.e. features of representative emotional phrases) into the model of each binary classifier. More specifically, the model is trained to incorporate other emotions in its representation by learning the parameters of an attention mechanism. Based on experiments on multiple datasets, our approach improves emotion classification performance in both supervised and few-shot domain adaptation settings, outperforming standard binary models in terms of accuracy and macro averaged F1-scores. The approach is generic and can be applied to other interrelated multi-label binary classification tasks.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"12 ","pages":"Article 100170"},"PeriodicalIF":0.0,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144489871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Homophobia and transphobia span identification in low-resource languages 同性恋恐惧症和跨性别恐惧症跨越了低资源语言的识别
Natural Language Processing Journal Pub Date : 2025-06-24 DOI: 10.1016/j.nlp.2025.100169
Prasanna Kumar Kumaresan , Devendra Deepak Kayande , Ruba Priyadharshini , Paul Buitelaar , Bharathi Raja Chakravarthi
{"title":"Homophobia and transphobia span identification in low-resource languages","authors":"Prasanna Kumar Kumaresan ,&nbsp;Devendra Deepak Kayande ,&nbsp;Ruba Priyadharshini ,&nbsp;Paul Buitelaar ,&nbsp;Bharathi Raja Chakravarthi","doi":"10.1016/j.nlp.2025.100169","DOIUrl":"10.1016/j.nlp.2025.100169","url":null,"abstract":"<div><div>Online platforms have become prevalent because they promote free speech and group discussions. However, they also serve as platforms for hate speech, which can negatively impact the psychological well-being of vulnerable people. This is especially true for members of the LGBTQ+ community, who are often the targets of homophobia and transphobia in online environments. Our study makes three main contributions: (1) we developed a new dataset with span-level annotations for homophobia and transphobia in Tamil, English, and Marathi; (2) we employed advanced language models using BERT-based architectures, Conditional Random Field (CRF), and Bidirectional Long Short-Term Memory (BiLSTM) layers to enhance span-level detection of harmful content; and (3) we conducted benchmarking to evaluate the effectiveness of monolingual and multilingual models in detecting subtle forms of hate speech. The annotated dataset, which is collected from real-world social media (YouTube) content, provides diverse language contexts and enhances the representation of low-resource languages. The span-based detection approach enables models to detect subtle linguistic nuances, leading to more precise content moderation that accounts for cultural differences. The experimental results show that our models achieve effective span detection, which provides valuable information for creating inclusive moderation tools. Our research leads to the development of AI systems, and we aim to reduce the burden on moderators and improve the quality of online experiences for LGBTQ+ vulnerable.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"12 ","pages":"Article 100169"},"PeriodicalIF":0.0,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144489870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The evolution of language models: From N-Grams to LLMs, and beyond 语言模型的演变:从n - gram到llm,以及其他
Natural Language Processing Journal Pub Date : 2025-06-24 DOI: 10.1016/j.nlp.2025.100168
Mohammad Ghaseminejad Raeini
{"title":"The evolution of language models: From N-Grams to LLMs, and beyond","authors":"Mohammad Ghaseminejad Raeini","doi":"10.1016/j.nlp.2025.100168","DOIUrl":"10.1016/j.nlp.2025.100168","url":null,"abstract":"<div><div>In the last couple of decades language models and artificial intelligence technologies have had significant improvements. Along with computer vision and image processing models, large language models (LLMs) are expected to have big impacts on how AI technologies will evolve. As such, it is important to study how language models have advanced since their inception; and more importantly how they will grow in the future.</div><div>In this article, we provide an overview of the evolution of language models. We start with early statistical and rule-based models. The advancement of language models are discussed all the way to nowadays transformer-based multimodal models (MM-LLMs). We discuss the shortcomings of the current language models and various aspects of the models that need to be improved upon. We also highlight the latest research trends in NLP. Furthermore, we pinpoint important aspects of language models and AI technologies that need further attention. This overview paper provides valuable insights about the progression of language models. It can be motivational and helpful for advancing the state-of-art language models.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"12 ","pages":"Article 100168"},"PeriodicalIF":0.0,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144522058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On English-Chinese Neural Machine Translation leveraging Transformer model 基于Transformer模型的英汉神经机器翻译研究
Natural Language Processing Journal Pub Date : 2025-06-23 DOI: 10.1016/j.nlp.2025.100166
Subrota Kumar Mondal , Yijun Chen , Yuning Cheng , Hong-Ning Dai , Syed B. Alam , H.M. Dipu Kabir
{"title":"On English-Chinese Neural Machine Translation leveraging Transformer model","authors":"Subrota Kumar Mondal ,&nbsp;Yijun Chen ,&nbsp;Yuning Cheng ,&nbsp;Hong-Ning Dai ,&nbsp;Syed B. Alam ,&nbsp;H.M. Dipu Kabir","doi":"10.1016/j.nlp.2025.100166","DOIUrl":"10.1016/j.nlp.2025.100166","url":null,"abstract":"<div><div>In today’s era of globalization, people’s cross-cultural communication has become increasingly frequent, and photo translation (photo, image, or scene text translation) technology has become an important tool. By using this translation technology, people can easily recognize and translate text from other languages without the need for manual input or translation. This has important practical value for people in fields such as tourism, business, education, and research. Therefore, photo translation technology has become an indispensable tool, providing more convenience to people’s lives and work. To this, this paper aims to achieve high accuracy English to Chinese photo translation, which can be divided into three stages: <span>text detection</span>, <span>text recognition</span>, and <span>text translation (i.e., machine translation)</span>. We observe that in text detection and recognition, we have challenges with occluded text, hand-written text, scene text, text with complex layout, distorted text, and many others. However, in this paper, we limit our analysis to Translation phase. For detection and recognition phase, we make use of current state-of-the-art methodologies, such as <span>DBNet</span> (Liao et al., 2020) model for detection and the <span>ABINet</span> (Fang et al., 2021) model for recognition. In the translation part, we use Transformer model with modifications towards improving the translation accuracy. The modifications are mainly reflected in two aspects: <span>data preprocessing</span> and <span>optimizer</span>. In the data preprocessing part, we use the <span>BPE</span> (Byte Pair Encoding) algorithm instead of basic word-centered tokenization algorithms. In the context, <span>BPE</span> algorithm can divide words into smaller subwords, which can solve the problem of rare words to some extent and provide better word vectors for language model training. In the optimizer part, we use the <span>Lion</span> model proposed by Google instead of the widely used <span>Adam</span> optimizer that helps reduce the loss more quickly than using Adam optimizer for small size batch — with batch size 256 achieves the lowest test loss 0.392842 (−1.072171) and the highest BLEU4 score 0.381281 (+0.24063). This adds value in reducing the consumption of training resources and the sustainability of deep learning.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"12 ","pages":"Article 100166"},"PeriodicalIF":0.0,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144472472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Can “consciousness” be observed from large language model (LLM) internal states? Dissecting LLM representations obtained from Theory of Mind test with Integrated Information Theory and Span Representation analysis “意识”可以从大型语言模型(LLM)内部状态观察到吗?运用综合信息理论和广度表征分析方法对心理理论测试的法学硕士表征进行剖析
Natural Language Processing Journal Pub Date : 2025-06-19 DOI: 10.1016/j.nlp.2025.100163
Jingkai Li
{"title":"Can “consciousness” be observed from large language model (LLM) internal states? Dissecting LLM representations obtained from Theory of Mind test with Integrated Information Theory and Span Representation analysis","authors":"Jingkai Li","doi":"10.1016/j.nlp.2025.100163","DOIUrl":"10.1016/j.nlp.2025.100163","url":null,"abstract":"<div><div>Integrated Information Theory (IIT) provides a quantitative framework for explaining consciousness phenomenon, positing that conscious systems comprise elements integrated through causal properties. We apply IIT 3.0 and 4.0 — the latest iterations of this framework — to sequences of Large Language Model (LLM) representations, analyzing data derived from existing Theory of Mind (ToM) test results. Our study systematically investigates whether the differences of ToM test performances, when presented in the LLM representations, can be revealed by IIT estimates, i.e., <span><math><msup><mrow><mi>Φ</mi></mrow><mrow><mo>max</mo></mrow></msup></math></span> (IIT 3.0), <span><math><mi>Φ</mi></math></span> (IIT 4.0), Conceptual Information (IIT 3.0), and <span><math><mi>Φ</mi></math></span>-structure (IIT 4.0). Furthermore, we compare these metrics with the Span Representations independent of any estimate for consciousness. This additional effort aims to differentiate between potential “consciousness” phenomena and inherent separations within LLM representational space. We conduct comprehensive experiments examining variations across LLM transformer layers and linguistic spans from stimuli. Our results suggest that sequences of contemporary Transformer-based LLM representations lack statistically significant indicators of observed “consciousness” phenomena but exhibit intriguing patterns under <em>spatio</em>-permutational analyses.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"12 ","pages":"Article 100163"},"PeriodicalIF":0.0,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144472473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing grammatical documentation for endangered languages with graph-based meaning representation and Loopy Belief Propagation 利用基于图的意义表示和循环信念传播增强濒危语言的语法文档
Natural Language Processing Journal Pub Date : 2025-06-18 DOI: 10.1016/j.nlp.2025.100164
Sebastien Christian
{"title":"Enhancing grammatical documentation for endangered languages with graph-based meaning representation and Loopy Belief Propagation","authors":"Sebastien Christian","doi":"10.1016/j.nlp.2025.100164","DOIUrl":"10.1016/j.nlp.2025.100164","url":null,"abstract":"<div><div>DIG4EL (Digital Inferential Grammars for Endangered Languages) is a method embodied in software designed to assist linguists and teachers in producing grammatical descriptions for endangered languages. DIG4EL integrates linguistic knowledge from extensive databases such as WALS and Grambank with automated observations of controlled data collected using Conversational Questionnaires.</div><div>Linguistic knowledge and automated observations provide priors to a Bayesian network of grammatical parameters, where parameters are interconnected by directional conditional probability matrices derived from statistics on world languages. Inference of unknown parameter values is performed using Loopy Belief Propagation, achieving an average accuracy of 76% and a median accuracy of 85% in an experimental grammatical domain, determining the values of eight parameters related to canonical word order across 116 languages from diverse language families.</div><div>DIG4EL produces outputs either as structured files for computational use, Microsoft Word files, or plain-language grammatical descriptions generated by a Large Language Model. These descriptions rely solely on vetted data and observed examples, with prompts crafted explicitly to prevent external information or hallucinations.</div><div>By leveraging probabilistic modeling and rich, yet quickly assembled linguistic data, DIG4EL provides a powerful, accessible tool for creating grammatical descriptions and language teaching materials with minimal intervention from linguists. It significantly reduces the time and expertise required for traditional documentation workflows, ensuring endangered languages are better documented and taught.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"12 ","pages":"Article 100164"},"PeriodicalIF":0.0,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144329749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Next-generation image captioning: A survey of methodologies and emerging challenges from transformers to Multimodal Large Language Models 下一代图像字幕:从变形器到多模态大语言模型的方法和新挑战的调查
Natural Language Processing Journal Pub Date : 2025-06-10 DOI: 10.1016/j.nlp.2025.100159
Huda Diab Abdulgalil, Otman A. Basir
{"title":"Next-generation image captioning: A survey of methodologies and emerging challenges from transformers to Multimodal Large Language Models","authors":"Huda Diab Abdulgalil,&nbsp;Otman A. Basir","doi":"10.1016/j.nlp.2025.100159","DOIUrl":"10.1016/j.nlp.2025.100159","url":null,"abstract":"<div><div>The widespread availability of visual data on the Internet has fueled a significant interest in image-to-text captioning systems. Automated image captioning remains a challenging multimodal analytics task, integrating advances in both Computer Vision (CV) and Natural Language Processing (NLP) to understand image content and generate semantically meaningful textual descriptions. Modern deep learning-based approaches have supplanted traditional approaches in image captioning, leading to more efficient and sophisticated models. The development of attention mechanisms and transformer-based architectures has further enhanced the modeling of both language and visual data. Despite these gains, challenges such as long-tailed object recognition, bias in training data, and shortcomings in evaluation metrics constrain the capabilities of current models. Furthermore, an important breakthrough has been made with the recent emergence of Multimodal Large Language Models (MLLMs). By incorporating textual and visual data, MLLMs provide improved captioning flexibility, generative capabilities, and reasoning. However, these models introduce new challenges, including faithfulness, grounding, and computational cost. Although relatively few studies have comprehensively surveyed these developments, this paper provides a thorough analysis of Transformer-based captioning approaches, investigates the shift to MLLMs, and discusses associated challenges and opportunities. We also present a performance comparison of the latest models on the MS-COCO benchmark and conclude with perspectives on potential future research directions.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"12 ","pages":"Article 100159"},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144313028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信