{"title":"A Comparison of Numeric Assessments of Ideas From Two Large Language Models: With Implications for Validating and Choosing LLMs","authors":"Daniel E. O’Leary","doi":"10.1109/mis.2024.3396371","DOIUrl":null,"url":null,"abstract":"This article compares numeric assessments generated by ChatGPT and Claude along four dimensions of novelty, feasibility, impact, and disruption, to study their ability to rate ideas. We find that those chatbots make numeric assessments that are consistent with the expected relationships between those dimensions, for example, novelty is negatively correlated with feasibility. We also find that the two chatbots make statistically significantly different numeric assessments of the same idea information. We suggest that this type of analysis can also be used to provide a type of validation of underlying chatbot capabilities. In addition, we suggest that, as part of their chatbot requirements analysis, enterprises use this approach to ensure that the chatbot appropriately “understands” concepts, in which they are directly interested.","PeriodicalId":13160,"journal":{"name":"IEEE Intelligent Systems","volume":"30 1","pages":""},"PeriodicalIF":5.6000,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/mis.2024.3396371","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
This article compares numeric assessments generated by ChatGPT and Claude along four dimensions of novelty, feasibility, impact, and disruption, to study their ability to rate ideas. We find that those chatbots make numeric assessments that are consistent with the expected relationships between those dimensions, for example, novelty is negatively correlated with feasibility. We also find that the two chatbots make statistically significantly different numeric assessments of the same idea information. We suggest that this type of analysis can also be used to provide a type of validation of underlying chatbot capabilities. In addition, we suggest that, as part of their chatbot requirements analysis, enterprises use this approach to ensure that the chatbot appropriately “understands” concepts, in which they are directly interested.
期刊介绍:
IEEE Intelligent Systems serves users, managers, developers, researchers, and purchasers who are interested in intelligent systems and artificial intelligence, with particular emphasis on applications. Typically they are degreed professionals, with backgrounds in engineering, hard science, or business. The publication emphasizes current practice and experience, together with promising new ideas that are likely to be used in the near future. Sample topic areas for feature articles include knowledge-based systems, intelligent software agents, natural-language processing, technologies for knowledge management, machine learning, data mining, adaptive and intelligent robotics, knowledge-intensive processing on the Web, and social issues relevant to intelligent systems. Also encouraged are application features, covering practice at one or more companies or laboratories; full-length product stories (which require refereeing by at least three reviewers); tutorials; surveys; and case studies. Often issues are theme-based and collect articles around a contemporary topic under the auspices of a Guest Editor working with the EIC.