大型语言模型评估的一种改进的最佳-最差方法与组合折衷解

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Intelligent Systems Pub Date : 2025-08-11 DOI:10.1155/int/2376097

O. S. Albahri, M. A. Alsalem, A. S. Albahri, Moamin A. Mahmoud, Laith Alzubaidi, A. H. Alamoodi, Iman Mohamad Sharaf

{"title":"大型语言模型评估的一种改进的最佳-最差方法与组合折衷解","authors":"O. S. Albahri, M. A. Alsalem, A. S. Albahri, Moamin A. Mahmoud, Laith Alzubaidi, A. H. Alamoodi, Iman Mohamad Sharaf","doi":"10.1155/int/2376097","DOIUrl":null,"url":null,"abstract":"<div>\n <p>The emergence of large language models (LLMs) has substantially changed the artificial intelligence field, enabling its wide use over different domains. As various LLM alternatives have been developed, the current study proposes a novel decision-support framework for evaluating and benchmarking LLMs based on multicriteria decision-making (MCDM) techniques. In the proposed framework, an improved version of the best-worst method (BWM) is proposed to effectively reduce the computational complexity of assigning a critical weight for the evaluation criteria of LLMs. Then, the improved BWM is integrated with the combined compromise solution (CoCoSo) method for ranking LLM alternatives. Findings show that the improved BWM successfully computes the criteria weights with low computational complexity compared to the original BWM. According to the enhanced BWM, the ‘factual errors’ criterion received the highest significant weight (0.2681), while the ‘logical inconsistencies’ criteria obtained the lowest (0.0827). The rest of the criteria were distributed in between that range. Subsequently, CoCoSo ranked the involved LLM alternatives in two different runs based on the extracted weights. Sensitivity analysis was employed to evaluate the effect of the assessment criteria on LLMs’ evaluation.</p>\n </div>","PeriodicalId":14089,"journal":{"name":"International Journal of Intelligent Systems","volume":"2025 1","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1155/int/2376097","citationCount":"0","resultStr":"{\"title\":\"An Improved Best-Worst Method Integrated With Combined Compromise Solution for Evaluating Large Language Models\",\"authors\":\"O. S. Albahri, M. A. Alsalem, A. S. Albahri, Moamin A. Mahmoud, Laith Alzubaidi, A. H. Alamoodi, Iman Mohamad Sharaf\",\"doi\":\"10.1155/int/2376097\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n <p>The emergence of large language models (LLMs) has substantially changed the artificial intelligence field, enabling its wide use over different domains. As various LLM alternatives have been developed, the current study proposes a novel decision-support framework for evaluating and benchmarking LLMs based on multicriteria decision-making (MCDM) techniques. In the proposed framework, an improved version of the best-worst method (BWM) is proposed to effectively reduce the computational complexity of assigning a critical weight for the evaluation criteria of LLMs. Then, the improved BWM is integrated with the combined compromise solution (CoCoSo) method for ranking LLM alternatives. Findings show that the improved BWM successfully computes the criteria weights with low computational complexity compared to the original BWM. According to the enhanced BWM, the ‘factual errors’ criterion received the highest significant weight (0.2681), while the ‘logical inconsistencies’ criteria obtained the lowest (0.0827). The rest of the criteria were distributed in between that range. Subsequently, CoCoSo ranked the involved LLM alternatives in two different runs based on the extracted weights. Sensitivity analysis was employed to evaluate the effect of the assessment criteria on LLMs’ evaluation.</p>\\n </div>\",\"PeriodicalId\":14089,\"journal\":{\"name\":\"International Journal of Intelligent Systems\",\"volume\":\"2025 1\",\"pages\":\"\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-08-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1155/int/2376097\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Intelligent Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1155/int/2376097\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1155/int/2376097","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

大型语言模型（llm）的出现极大地改变了人工智能领域，使其在不同领域得到广泛应用。随着各种法学硕士替代方案的发展，本研究提出了一种基于多标准决策（MCDM）技术的新的决策支持框架，用于评估和对法学硕士进行基准测试。在该框架中，提出了一种改进的最佳-最差方法（best-worst method， BWM），有效降低了为llm评价标准分配临界权重的计算复杂度。然后，将改进的BWM与组合妥协解（CoCoSo）方法相结合，对LLM备选方案进行排序。结果表明，改进后的BWM能较好地计算出准则权重，且计算复杂度较低。根据增强的BWM，“事实错误”标准获得最高的显著权重（0.2681），而“逻辑不一致”标准获得最低的显著权重（0.0827）。其余的标准分布在这个范围之间。随后，CoCoSo根据提取的权重对两次不同运行中涉及的LLM备选方案进行排序。采用敏感性分析评价评价标准对llm评价的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

An Improved Best-Worst Method Integrated With Combined Compromise Solution for Evaluating Large Language Models

查看原文本刊更多论文

An Improved Best-Worst Method Integrated With Combined Compromise Solution for Evaluating Large Language Models

The emergence of large language models (LLMs) has substantially changed the artificial intelligence field, enabling its wide use over different domains. As various LLM alternatives have been developed, the current study proposes a novel decision-support framework for evaluating and benchmarking LLMs based on multicriteria decision-making (MCDM) techniques. In the proposed framework, an improved version of the best-worst method (BWM) is proposed to effectively reduce the computational complexity of assigning a critical weight for the evaluation criteria of LLMs. Then, the improved BWM is integrated with the combined compromise solution (CoCoSo) method for ranking LLM alternatives. Findings show that the improved BWM successfully computes the criteria weights with low computational complexity compared to the original BWM. According to the enhanced BWM, the ‘factual errors’ criterion received the highest significant weight (0.2681), while the ‘logical inconsistencies’ criteria obtained the lowest (0.0827). The rest of the criteria were distributed in between that range. Subsequently, CoCoSo ranked the involved LLM alternatives in two different runs based on the extracted weights. Sensitivity analysis was employed to evaluate the effect of the assessment criteria on LLMs’ evaluation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Intelligent Systems 工程技术-计算机：人工智能

CiteScore

11.30

自引率

14.30%

发文量

304

审稿时长

9 months

期刊介绍： The International Journal of Intelligent Systems serves as a forum for individuals interested in tapping into the vast theories based on intelligent systems construction. With its peer-reviewed format, the journal explores several fascinating editorials written by today''s experts in the field. Because new developments are being introduced each day, there''s much to be learned — examination, analysis creation, information retrieval, man–computer interactions, and more. The International Journal of Intelligent Systems uses charts and illustrations to demonstrate these ground-breaking issues, and encourages readers to share their thoughts and experiences.