大型语言模型评估的一种改进的最佳-最差方法与组合折衷解

IF 3.7 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
O. S. Albahri, M. A. Alsalem, A. S. Albahri, Moamin A. Mahmoud, Laith Alzubaidi, A. H. Alamoodi, Iman Mohamad Sharaf
{"title":"大型语言模型评估的一种改进的最佳-最差方法与组合折衷解","authors":"O. S. Albahri,&nbsp;M. A. Alsalem,&nbsp;A. S. Albahri,&nbsp;Moamin A. Mahmoud,&nbsp;Laith Alzubaidi,&nbsp;A. H. Alamoodi,&nbsp;Iman Mohamad Sharaf","doi":"10.1155/int/2376097","DOIUrl":null,"url":null,"abstract":"<div>\n <p>The emergence of large language models (LLMs) has substantially changed the artificial intelligence field, enabling its wide use over different domains. As various LLM alternatives have been developed, the current study proposes a novel decision-support framework for evaluating and benchmarking LLMs based on multicriteria decision-making (MCDM) techniques. In the proposed framework, an improved version of the best-worst method (BWM) is proposed to effectively reduce the computational complexity of assigning a critical weight for the evaluation criteria of LLMs. Then, the improved BWM is integrated with the combined compromise solution (CoCoSo) method for ranking LLM alternatives. Findings show that the improved BWM successfully computes the criteria weights with low computational complexity compared to the original BWM. According to the enhanced BWM, the ‘factual errors’ criterion received the highest significant weight (0.2681), while the ‘logical inconsistencies’ criteria obtained the lowest (0.0827). The rest of the criteria were distributed in between that range. Subsequently, CoCoSo ranked the involved LLM alternatives in two different runs based on the extracted weights. Sensitivity analysis was employed to evaluate the effect of the assessment criteria on LLMs’ evaluation.</p>\n </div>","PeriodicalId":14089,"journal":{"name":"International Journal of Intelligent Systems","volume":"2025 1","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1155/int/2376097","citationCount":"0","resultStr":"{\"title\":\"An Improved Best-Worst Method Integrated With Combined Compromise Solution for Evaluating Large Language Models\",\"authors\":\"O. S. Albahri,&nbsp;M. A. Alsalem,&nbsp;A. S. Albahri,&nbsp;Moamin A. Mahmoud,&nbsp;Laith Alzubaidi,&nbsp;A. H. Alamoodi,&nbsp;Iman Mohamad Sharaf\",\"doi\":\"10.1155/int/2376097\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n <p>The emergence of large language models (LLMs) has substantially changed the artificial intelligence field, enabling its wide use over different domains. As various LLM alternatives have been developed, the current study proposes a novel decision-support framework for evaluating and benchmarking LLMs based on multicriteria decision-making (MCDM) techniques. In the proposed framework, an improved version of the best-worst method (BWM) is proposed to effectively reduce the computational complexity of assigning a critical weight for the evaluation criteria of LLMs. Then, the improved BWM is integrated with the combined compromise solution (CoCoSo) method for ranking LLM alternatives. Findings show that the improved BWM successfully computes the criteria weights with low computational complexity compared to the original BWM. According to the enhanced BWM, the ‘factual errors’ criterion received the highest significant weight (0.2681), while the ‘logical inconsistencies’ criteria obtained the lowest (0.0827). The rest of the criteria were distributed in between that range. Subsequently, CoCoSo ranked the involved LLM alternatives in two different runs based on the extracted weights. Sensitivity analysis was employed to evaluate the effect of the assessment criteria on LLMs’ evaluation.</p>\\n </div>\",\"PeriodicalId\":14089,\"journal\":{\"name\":\"International Journal of Intelligent Systems\",\"volume\":\"2025 1\",\"pages\":\"\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-08-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1155/int/2376097\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Intelligent Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1155/int/2376097\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1155/int/2376097","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

大型语言模型(llm)的出现极大地改变了人工智能领域,使其在不同领域得到广泛应用。随着各种法学硕士替代方案的发展,本研究提出了一种基于多标准决策(MCDM)技术的新的决策支持框架,用于评估和对法学硕士进行基准测试。在该框架中,提出了一种改进的最佳-最差方法(best-worst method, BWM),有效降低了为llm评价标准分配临界权重的计算复杂度。然后,将改进的BWM与组合妥协解(CoCoSo)方法相结合,对LLM备选方案进行排序。结果表明,改进后的BWM能较好地计算出准则权重,且计算复杂度较低。根据增强的BWM,“事实错误”标准获得最高的显著权重(0.2681),而“逻辑不一致”标准获得最低的显著权重(0.0827)。其余的标准分布在这个范围之间。随后,CoCoSo根据提取的权重对两次不同运行中涉及的LLM备选方案进行排序。采用敏感性分析评价评价标准对llm评价的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

An Improved Best-Worst Method Integrated With Combined Compromise Solution for Evaluating Large Language Models

An Improved Best-Worst Method Integrated With Combined Compromise Solution for Evaluating Large Language Models

The emergence of large language models (LLMs) has substantially changed the artificial intelligence field, enabling its wide use over different domains. As various LLM alternatives have been developed, the current study proposes a novel decision-support framework for evaluating and benchmarking LLMs based on multicriteria decision-making (MCDM) techniques. In the proposed framework, an improved version of the best-worst method (BWM) is proposed to effectively reduce the computational complexity of assigning a critical weight for the evaluation criteria of LLMs. Then, the improved BWM is integrated with the combined compromise solution (CoCoSo) method for ranking LLM alternatives. Findings show that the improved BWM successfully computes the criteria weights with low computational complexity compared to the original BWM. According to the enhanced BWM, the ‘factual errors’ criterion received the highest significant weight (0.2681), while the ‘logical inconsistencies’ criteria obtained the lowest (0.0827). The rest of the criteria were distributed in between that range. Subsequently, CoCoSo ranked the involved LLM alternatives in two different runs based on the extracted weights. Sensitivity analysis was employed to evaluate the effect of the assessment criteria on LLMs’ evaluation.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of Intelligent Systems
International Journal of Intelligent Systems 工程技术-计算机:人工智能
CiteScore
11.30
自引率
14.30%
发文量
304
审稿时长
9 months
期刊介绍: The International Journal of Intelligent Systems serves as a forum for individuals interested in tapping into the vast theories based on intelligent systems construction. With its peer-reviewed format, the journal explores several fascinating editorials written by today''s experts in the field. Because new developments are being introduced each day, there''s much to be learned — examination, analysis creation, information retrieval, man–computer interactions, and more. The International Journal of Intelligent Systems uses charts and illustrations to demonstrate these ground-breaking issues, and encourages readers to share their thoughts and experiences.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信