Applied Intelligence最新文献

筛选
英文 中文
Identification and empirical investigation of factors influencing cost estimation in Agile development 敏捷开发中影响成本估算因素的识别与实证研究
IF 3.5 2区 计算机科学
Applied Intelligence Pub Date : 2026-05-02 DOI: 10.1007/s10489-026-07259-1
Xiaoyan Zhao, Zulkefli Mansor, Rozilawati Razali, Mohd Zakree Ahmad Nazri, Liangyu Li, Xuwei Guo
{"title":"Identification and empirical investigation of factors influencing cost estimation in Agile development","authors":"Xiaoyan Zhao,&nbsp;Zulkefli Mansor,&nbsp;Rozilawati Razali,&nbsp;Mohd Zakree Ahmad Nazri,&nbsp;Liangyu Li,&nbsp;Xuwei Guo","doi":"10.1007/s10489-026-07259-1","DOIUrl":"10.1007/s10489-026-07259-1","url":null,"abstract":"<div><p>Accurate cost estimation is crucial for the success of agile software development projects. However, it remains a significant challenge due to the lack of cost drivers specific to agile methodologies. This study addresses this gap by identifying and empirically validating the key factors influencing cost estimation in agile development. This research employs a dual-method approach that combines a systematic literature review (SLR) approach with empirical research. The SLR analyzed 25 studies published over the past decade from 4 major data repositories to identify potential cost drivers. As a result, 55 cost drivers were found and classified into four aspects: project, people, process, and product, of which 19 factors had a higher-than-average importance ratio. To verify these factors, this study collected 150 questionnaires from 12 countries and regions. Respondents evaluated these 55 driving factors based on project experience and considered 22 to have a higher-than-average importance ratio. Therefore, this research used the Pearson correlation test and Spearman correlation test to analyze the correlation between the findings of SLR and the empirical study. The test results show that the correlation coefficients were 0.710 and 0.834, respectively, and the P values were both less than 0.001. It can be considered that there was a strong positive correlation between them. Based on the above research, this research analyzed the factors that were above average importance ratio in both approaches, finally obtained 17 key cost drivers, and elaborated on their connotation and understanding. This study fills an important gap by systematically identifying and empirically validating agile development-specific cost drivers. This research makes a significant contribution to cost estimation for agile development and is an indispensable step in using mathematical methods such as optimization algorithms and machine learning algorithms to conduct cost estimation research, and can further improve the accuracy of cost estimation.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 7","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147796386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Probabilistic linguistic interval functional evaluation mechanism under high-frequency aperiodic reviews: a hotel screening application 高频非周期性评论下的概率语言区间功能评价机制:酒店筛选应用
IF 3.5 2区 计算机科学
Applied Intelligence Pub Date : 2026-05-02 DOI: 10.1007/s10489-026-07205-1
Bo Li, Jie Peng, Ka Li, Zeshui Xu, Chonghui Zhang
{"title":"Probabilistic linguistic interval functional evaluation mechanism under high-frequency aperiodic reviews: a hotel screening application","authors":"Bo Li,&nbsp;Jie Peng,&nbsp;Ka Li,&nbsp;Zeshui Xu,&nbsp;Chonghui Zhang","doi":"10.1007/s10489-026-07205-1","DOIUrl":"10.1007/s10489-026-07205-1","url":null,"abstract":"<div><p>Access to reliable product reviews is crucial for consumers to make well-informed purchasing decisions. The process of extracting critical information from high-frequency, real-time, and unstructured text data also assists in ascertaining public opinion and but also can outline important indicators for business. Therefore, to improve the effectiveness of management decision-making in the context of high-frequency data, this paper constructs a three-way decision (TWD) evaluation model by introducing a novel interval-valued uncertainty function. First, this paper applies natural language processing to transform high-frequency comments and leverages the expressive power of probabilistic linguistic term sets to characterize complex uncertainties. This leads to a new concept known as probabilistic linguistic interval function. Second, by introducing clamped B-spline basis functions, high-frequency linguistic information is fitted into an interval function via an optimization model that accounts for individual decision-maker deviations. Third, to further analyze the large online review data, a cluster algorithm-based new membership degree matrix is constructed. The conditional probabilities are calculated to determine the classification regions, which provide the fundamental basis for characterizing and classifying alternatives. Based on favorable and unfavorable situations, the probabilistic linguistic TWD mechanism is constructed to process high-frequency aperiodic linguistic comments, enabling the division of alternatives into class domains depending on their utility values. Finally, a case study is used to demonstrate the proposed method, and parameter and comparative analyses are employed to validate its practicality and superiority.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 7","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147796846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attribute reduction learning based on double-quantitative similarity granulations and fusion measures in interval-set decision systems 区间集决策系统中基于双定量相似粒和融合测度的属性约简学习
IF 3.5 2区 计算机科学
Applied Intelligence Pub Date : 2026-05-02 DOI: 10.1007/s10489-026-07198-x
Xin Xie, Xianyong Zhang, Xiaoling Yang, Jilin Yang
{"title":"Attribute reduction learning based on double-quantitative similarity granulations and fusion measures in interval-set decision systems","authors":"Xin Xie,&nbsp;Xianyong Zhang,&nbsp;Xiaoling Yang,&nbsp;Jilin Yang","doi":"10.1007/s10489-026-07198-x","DOIUrl":"10.1007/s10489-026-07198-x","url":null,"abstract":"<div><p>Attribute reductions facilitate data learning, and they can function on interval-set decision systems (ISDSs). In ISDSs, similarity and dependency degrees are fundamental measures to respectively underlie knowledge granulations and attribute reductions; however, they adhere to only single quantifications on relativeness or absoluteness, so they exhibit measurement weakness and development space for attribute reductions. In this paper embracing ISDSs, underlying measures are improved by using double-quantitative fusions, so double-quantitative similarity granulations and fusion measures emerge to advance attribute reductions for classification learning. At first, the double-quantitative similarity of interval sets is proposed by balancing absolute and relative similarities, and its equivalence granulation matches but improves the current equivalence granulation from absolute similarity. Then, double-hierarchical and double-quantitative precision-dependencies are two-dimensionally constructed by using geometric and arithmetic mean fusions. Concretely, the relative precision and absolute dependency are directly fused by statistical averages, so the double-quantitative precision-dependency emerges at the classification level; the class-level precision and dependency are hierarchically determined, and they rely on statistical mean fusions and hierarchical summation integrations to yield the new double-quantitative precision-dependency at the classification level; thus, two-mean and two-level fusions formulate <span>(2times2=4)</span> precision-dependencies for classification learning, and they get size relationships and granulation non-monotonicity. Furthermore by adding absolute and double-quantitative similarity granulations as well as single dependencies, <span>(2times(1+2times 2)=10)</span> measures systematically motivate 10 heuristic algorithms of attribute reductions, where 9 algorithms become novel and improved. Finally, reduction measures and algorithms are validated by data experiments, and our new reduction algorithms outperform several contrast algorithms for classification performance.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 7","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147796862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Communication scheduling via proposed multi-agent reinforcement learning with centralized learning and distributed execution 采用集中学习和分布式执行的多智能体强化学习方法进行通信调度
IF 3.5 2区 计算机科学
Applied Intelligence Pub Date : 2026-05-02 DOI: 10.1007/s10489-026-07202-4
Ajay Nagendra Nama
{"title":"Communication scheduling via proposed multi-agent reinforcement learning with centralized learning and distributed execution","authors":"Ajay Nagendra Nama","doi":"10.1007/s10489-026-07202-4","DOIUrl":"10.1007/s10489-026-07202-4","url":null,"abstract":"<div><p>A crucial component of communication systems’ architecture, communication scheduling is essential to guaranteeing its effectiveness, dependability, and smooth operation. Fundamentally, communication scheduling controls the prioritization, frequency, and timing of information exchanges between various actors in these systems. Scheduling eliminates conflicts in shared communication channels, avoids network congestion, and maximizes resource efficiency by carefully planning when and how data is transferred. In situations like wireless networks, when bandwidth is scarce and shared among several devices, this careful management becomes more important to prevent interference. Additionally, scheduling guarantees timely delivery and reduces latency in real-time applications and systems handling vital information. Effective scheduling saves energy and extends network life in resource-constrained contexts like sensor networks or Internet of Things (IoT) devices. The foundation of efficient, dependable, and responsive communication systems is essentially efficient communication scheduling, which permits smooth data flow while dynamically adjusting to a variety of changing network conditions. For this reason, this work introduces a unique Communication Scheduling utilizing Multi-Agent Reinforcement Learning (CS-MARL) approach that combines distributed execution with centralized learning. Additionally, by offering a scalable and flexible solution for communication scheduling inside complex systems, the suggested strategy incorporates a notion known as \"communication lagging\" to create a balance between centralized learning and decentralized decision-making. Additionally, the effectiveness of the CS-MARL technique in communication scheduling in complicated situations is assessed. With execution times of 7.375 s, 5.910 s, and 4.610 s for agent3, agent4, and agent5, respectively, the CS-MARL algorithm consistently outperforms all others, yielding the biggest improvement.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 7","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147796909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust minimax multi-agent deep deterministic policy gradient for reward uncertainty 奖励不确定性的鲁棒极大极小多智能体深度确定性策略梯度
IF 3.5 2区 计算机科学
Applied Intelligence Pub Date : 2026-05-02 DOI: 10.1007/s10489-026-07255-5
Daicheng Song, Qiming Yang, Yixuan Lin, Haoxuan Zeng, Jingwen Chong, Li Song
{"title":"Robust minimax multi-agent deep deterministic policy gradient for reward uncertainty","authors":"Daicheng Song,&nbsp;Qiming Yang,&nbsp;Yixuan Lin,&nbsp;Haoxuan Zeng,&nbsp;Jingwen Chong,&nbsp;Li Song","doi":"10.1007/s10489-026-07255-5","DOIUrl":"10.1007/s10489-026-07255-5","url":null,"abstract":"<div><p>Despite advancements in Multi-Agent Deep Reinforcement Learning (MADRL), agents often exhibit fragility in dynamic adversarial environments due to reward function uncertainty and opponent policy shifts. Traditional RL algorithms struggle with unstable policy convergence and robustness issues under such uncertainties, as they inadequately model worst-case adversarial perturbations. To address this, we propose Robust-M3DDPG, a robust minimax multi-agent reinforcement learning framework that integrates Nash Equilibrium principles with minimax optimization. The approach formalizes the problem as a Robust Markov Game, explicitly modeling adversarial disturbances during policy optimization to enhance robustness in non-stationary environments. Key contributions include: (1) developing an Actor-Critic algorithm-based method to determine Nash Equilibrium policies; (2) extending the widely used Multi-Agent Deep Deterministic Policy Gradient Algorithm (MADDPG) with Robust Markov Games and minimax optimization for robust policy learning; and (3) proposing a Multi-Agent Adversarial Learning (MAAL) framework for efficiently solving adversarial policies to fulfill the minimax requirements. Evaluations across four cooperative-competitive multi-agent environments with different reward uncertainty levels demonstrate that Robust-M3DDPG significantly outperforms existing MADRL baselines in scenarios with high reward uncertainty and adversarial dynamics. This work bridges the gap between theoretical robustness guarantees and practical multi-agent reinforcement learning deployment under real-world perturbations.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 7","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147796385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TrafficMCA: an effective LLM-empowered framework for traffic flow prediction via mutual cross-attention TrafficMCA:一个有效的llm授权框架,通过相互交叉关注来预测交通流量
IF 3.5 2区 计算机科学
Applied Intelligence Pub Date : 2026-05-02 DOI: 10.1007/s10489-026-07251-9
Yiwu Xu, Yun Chen
{"title":"TrafficMCA: an effective LLM-empowered framework for traffic flow prediction via mutual cross-attention","authors":"Yiwu Xu,&nbsp;Yun Chen","doi":"10.1007/s10489-026-07251-9","DOIUrl":"10.1007/s10489-026-07251-9","url":null,"abstract":"<div><p>Traffic flow prediction is an essential area of intelligent transportation systems with broad practical applications. Recently, Large Language Models (LLMs) that integrate traffic data with textual prompts have achieved remarkable performance improvements in traffic flow prediction. However, we observed that current LLM-based methods struggle to effectively align numerical traffic sequences and textual prompts during cross-modal fusion, leading to inter-modal interference that limits further performance gains. To address this issue, we propose TrafficMCA, an LLM-empowered framework for traffic flow prediction via mutual cross-attention. Specifically, we design a dual-modality encoding module comprising two branches: the traffic flow encoding branch extracts fundamental spatio-temporal features from traffic data by integrating timestep, hour-of-day, and spatial embeddings, while the prompt text encoding branch leverages a pre-trained encoder to extract rich semantic features from textual prompts. To enhance cross-modal fusion, we introduce a Mutual Cross-Attention (MCA) mechanism that explicitly captures complementary information between the two modalities, enabling collaborative guidance and bidirectional enhancement of features. As another key design, we adopt Low-Rank Adaptation (LoRA) to fine-tune the pre-trained LLM backbone in TrafficMCA, which significantly reduces computational overhead while effectively maintaining predictive performance. Extensive experiments demonstrate that TrafficMCA outperforms 15 state-of-the-art methods and exhibits strong generalization capabilities in few-shot and zero-shot scenarios.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 7","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147796863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Audio–visual segmentation via hierarchical side tuning with state space model 基于状态空间模型的分层侧调的视听分割
IF 3.5 2区 计算机科学
Applied Intelligence Pub Date : 2026-05-02 DOI: 10.1007/s10489-026-07249-3
Wenyi Xia, Qingwei Geng, Xiaodong Gu
{"title":"Audio–visual segmentation via hierarchical side tuning with state space model","authors":"Wenyi Xia,&nbsp;Qingwei Geng,&nbsp;Xiaodong Gu","doi":"10.1007/s10489-026-07249-3","DOIUrl":"10.1007/s10489-026-07249-3","url":null,"abstract":"<div><p>Audio-Visual Segmentation (AVS) is a task aimed at predicting pixel-level masks for sound-producing objects in videos. Existing AVS methods typically involve a heavy audio encoder and use full fine-tuning or additional structures like adapters on the visual branch. We argue that these approaches incur significant training costs and may disrupt the pretrained model’s prior knowledge. To address this, we propose a novel hierarchical side-tuning framework for AVS, utilizing a side network that simultaneously performs audio encoding and cross-domain adaptation. By freezing the visual encoder and only tuning the side network, we significantly reduce the number of parameters to be trained. Additionally, inspired by the recent success of state space models (SSMs), we introduce an Audio-Visual Fusion (AVF) module in the side network and design a lightweight SSM-based decoder to enhance feature fusion and decoding. Experimental results on multiple AVS datasets demonstrate that our method achieves competitive or even superior performance compared with state-of-the-art approaches, while using fewer learnable parameters.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 7","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147796847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sentiment analysis on review texts using category of words information and string kernels 基于词类信息和字符串核的评论文本情感分析
IF 3.5 2区 计算机科学
Applied Intelligence Pub Date : 2026-05-01 DOI: 10.1007/s10489-026-07256-4
Jose M. Cuevas-Muñoz, Nicolás E. García-Pedrajas, Aida De Haro-García
{"title":"Sentiment analysis on review texts using category of words information and string kernels","authors":"Jose M. Cuevas-Muñoz,&nbsp;Nicolás E. García-Pedrajas,&nbsp;Aida De Haro-García","doi":"10.1007/s10489-026-07256-4","DOIUrl":"10.1007/s10489-026-07256-4","url":null,"abstract":"<div><p>With millions of opinions written every day around the internet, analyzing review sentiment has been shown to be an interesting and relevant problem. Support vector machines offer an excellent alternative when the amount of available data makes other models, such as deep learning, infeasible. A usual way to detect hidden sentiments in textual data is to address the mutual information through a corpus with a support vector machine or any other sophisticated classification algorithm. Approaches that are able to extract information from sequences of words, such as string kernels, have the potential for better performance. However, finding similarities can be difficult given the ample texts used to express opinions and the wide variety of vocabulary. To solve that problem, we suggest using clustering methods to automatically group words into categories based on a word vector, replacing the words in the dataset with their corresponding categories, and then using these categories to find mutual information in the text with support vector machines that use string kernels. This approach significantly reduces the token space and enhances the efficiency of the kernel methods. The proposed method showed better performance than state-of-the-art approaches for this task in a set of real-world problems. Different models were tested against our proposal. Results indicate that the proposed method has the ability to extract useful data from opinions in long texts and remains an interesting option for review sentiment analysis in general, even outperforming other state-of-the-art methods in certain datasets. It also opens the possibility of applying the same philosophy to deep learning and similar models.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 7","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10489-026-07256-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147757196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CovMADT: efficient offline multi-agent reinforcement learning via convex Markov games 基于凸马尔可夫博弈的高效离线多智能体强化学习
IF 3.5 2区 计算机科学
Applied Intelligence Pub Date : 2026-04-30 DOI: 10.1007/s10489-026-07241-x
Sikong Wen, Rui Wang, Dechen Wu, Zixuan Wang
{"title":"CovMADT: efficient offline multi-agent reinforcement learning via convex Markov games","authors":"Sikong Wen,&nbsp;Rui Wang,&nbsp;Dechen Wu,&nbsp;Zixuan Wang","doi":"10.1007/s10489-026-07241-x","DOIUrl":"10.1007/s10489-026-07241-x","url":null,"abstract":"<div>\u0000 \u0000 <p>Offline reinforcement learning (RL) allows agents to learn effective policies without any direct interaction with the environment by relying solely on pre-collected data. However, conventional offline RL methods face challenges such as out-of-distribution generalization and the curse of dimensionality. To mitigate these difficulties, we propose the convex Markov game-based Multi-Agent Decision Transformer (CovMADT) algorithm, which provides a theoretical guarantee of a pure-strategy Nash equilibrium under strictly concave utility functions. Within the framework of convex Markov games (cMG), we estimate state transition probabilities in continuous multi-dimensional spaces by leveraging reproducing kernel Hilbert space (RKHS) embeddings and the empirical distribution of states, while simultaneously modeling imitation-like utilities. To improve model performance, we integrate Mean-Field Value Iteration (MFVI) as the critic, exploit permutation invariance and kernel techniques to optimize computational efficiency, and empirically validate their effectiveness in mitigating Transformer degradation. The experimental results demonstrate that CovMADT enables agents to learn complex coordination strategies and achieve superior performance in both competitive and cooperative tasks by accurately capturing underlying physical dynamics. The code will be published after it is accepted. Within the framework of convex Markov games (cMG), we estimate state transition probabilities in continuous multi-dimensional spaces by leveraging reproducing kernel Hilbert space (RKHS) embeddings and the empirical distribution of states, while simultaneously modeling imitation-regularized utilities. To improve model performance, we integrate Mean-Field Value Iteration (MFVI) as the critic, exploit permutation invariance and kernel techniques to optimize computational efficiency, and empirically validate their effectiveness in mitigating Transformer degradation. The experimental results demonstrate that CovMADT enables agents to learn complex coordination strategies and achieve superior performance in both competitive and cooperative tasks by accurately capturing underlying physical dynamics. The code will be published after it is accepted.</p>\u0000 </div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 6","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147797207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Probing subjective judgment variance in LLM evaluators: A framework for robust IR evaluation 探究法学硕士评估者的主观判断差异:一个稳健的IR评估框架
IF 3.5 2区 计算机科学
Applied Intelligence Pub Date : 2026-04-29 DOI: 10.1007/s10489-026-07244-8
Ritesh Kumar
{"title":"Probing subjective judgment variance in LLM evaluators: A framework for robust IR evaluation","authors":"Ritesh Kumar","doi":"10.1007/s10489-026-07244-8","DOIUrl":"10.1007/s10489-026-07244-8","url":null,"abstract":"<div>\u0000 \u0000 <p>Large Language Models (LLMs) are increasingly used as evaluators for subjective Information Retrieval (IR) tasks, yet their reliability under ambiguity remains poorly understood. We systematically probe LLM-as-a-judge systems by deliberately maximizing inter-model disagreement, which we formalize as horizontal variance. Across five widely used LLM evaluators, we observe that ambiguity-based prompts induce up to a 63% increase in horizontal variance, while intra-model (vertical) variance remains stable. This indicates systematic, not random, evaluator inconsistency. Our results show that models may appear stable in isolation yet diverge substantially when evaluated collectively, exposing hidden bias and instability. These findings highlight the risks of naive deployment of LLM evaluators in fairness-sensitive IR pipelines and motivate multi-model, variance-aware evaluation strategies.</p>\u0000 </div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 6","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147797068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书