O. S. Albahri, M. A. Alsalem, A. S. Albahri, Moamin A. Mahmoud, Laith Alzubaidi, A. H. Alamoodi, Iman Mohamad Sharaf
{"title":"大型语言模型评估的一种改进的最佳-最差方法与组合折衷解","authors":"O. S. Albahri, M. A. Alsalem, A. S. Albahri, Moamin A. Mahmoud, Laith Alzubaidi, A. H. Alamoodi, Iman Mohamad Sharaf","doi":"10.1155/int/2376097","DOIUrl":null,"url":null,"abstract":"<div>\n <p>The emergence of large language models (LLMs) has substantially changed the artificial intelligence field, enabling its wide use over different domains. As various LLM alternatives have been developed, the current study proposes a novel decision-support framework for evaluating and benchmarking LLMs based on multicriteria decision-making (MCDM) techniques. In the proposed framework, an improved version of the best-worst method (BWM) is proposed to effectively reduce the computational complexity of assigning a critical weight for the evaluation criteria of LLMs. Then, the improved BWM is integrated with the combined compromise solution (CoCoSo) method for ranking LLM alternatives. Findings show that the improved BWM successfully computes the criteria weights with low computational complexity compared to the original BWM. According to the enhanced BWM, the ‘factual errors’ criterion received the highest significant weight (0.2681), while the ‘logical inconsistencies’ criteria obtained the lowest (0.0827). The rest of the criteria were distributed in between that range. Subsequently, CoCoSo ranked the involved LLM alternatives in two different runs based on the extracted weights. Sensitivity analysis was employed to evaluate the effect of the assessment criteria on LLMs’ evaluation.</p>\n </div>","PeriodicalId":14089,"journal":{"name":"International Journal of Intelligent Systems","volume":"2025 1","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1155/int/2376097","citationCount":"0","resultStr":"{\"title\":\"An Improved Best-Worst Method Integrated With Combined Compromise Solution for Evaluating Large Language Models\",\"authors\":\"O. S. Albahri, M. A. Alsalem, A. S. Albahri, Moamin A. Mahmoud, Laith Alzubaidi, A. H. Alamoodi, Iman Mohamad Sharaf\",\"doi\":\"10.1155/int/2376097\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n <p>The emergence of large language models (LLMs) has substantially changed the artificial intelligence field, enabling its wide use over different domains. As various LLM alternatives have been developed, the current study proposes a novel decision-support framework for evaluating and benchmarking LLMs based on multicriteria decision-making (MCDM) techniques. In the proposed framework, an improved version of the best-worst method (BWM) is proposed to effectively reduce the computational complexity of assigning a critical weight for the evaluation criteria of LLMs. Then, the improved BWM is integrated with the combined compromise solution (CoCoSo) method for ranking LLM alternatives. Findings show that the improved BWM successfully computes the criteria weights with low computational complexity compared to the original BWM. According to the enhanced BWM, the ‘factual errors’ criterion received the highest significant weight (0.2681), while the ‘logical inconsistencies’ criteria obtained the lowest (0.0827). The rest of the criteria were distributed in between that range. Subsequently, CoCoSo ranked the involved LLM alternatives in two different runs based on the extracted weights. Sensitivity analysis was employed to evaluate the effect of the assessment criteria on LLMs’ evaluation.</p>\\n </div>\",\"PeriodicalId\":14089,\"journal\":{\"name\":\"International Journal of Intelligent Systems\",\"volume\":\"2025 1\",\"pages\":\"\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-08-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1155/int/2376097\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Intelligent Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1155/int/2376097\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1155/int/2376097","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
An Improved Best-Worst Method Integrated With Combined Compromise Solution for Evaluating Large Language Models
The emergence of large language models (LLMs) has substantially changed the artificial intelligence field, enabling its wide use over different domains. As various LLM alternatives have been developed, the current study proposes a novel decision-support framework for evaluating and benchmarking LLMs based on multicriteria decision-making (MCDM) techniques. In the proposed framework, an improved version of the best-worst method (BWM) is proposed to effectively reduce the computational complexity of assigning a critical weight for the evaluation criteria of LLMs. Then, the improved BWM is integrated with the combined compromise solution (CoCoSo) method for ranking LLM alternatives. Findings show that the improved BWM successfully computes the criteria weights with low computational complexity compared to the original BWM. According to the enhanced BWM, the ‘factual errors’ criterion received the highest significant weight (0.2681), while the ‘logical inconsistencies’ criteria obtained the lowest (0.0827). The rest of the criteria were distributed in between that range. Subsequently, CoCoSo ranked the involved LLM alternatives in two different runs based on the extracted weights. Sensitivity analysis was employed to evaluate the effect of the assessment criteria on LLMs’ evaluation.
期刊介绍:
The International Journal of Intelligent Systems serves as a forum for individuals interested in tapping into the vast theories based on intelligent systems construction. With its peer-reviewed format, the journal explores several fascinating editorials written by today''s experts in the field. Because new developments are being introduced each day, there''s much to be learned — examination, analysis creation, information retrieval, man–computer interactions, and more. The International Journal of Intelligent Systems uses charts and illustrations to demonstrate these ground-breaking issues, and encourages readers to share their thoughts and experiences.