AI evaluation of ChatGPT and human generated image/textual contents by bipolar generalized fuzzy hypergraph

IF 10.7 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Review Pub Date : 2025-01-08 DOI:10.1007/s10462-024-11015-7

Abbas Amini, Narjes Firouzkouhi, Wael Farag, Omar Ali, Isam Zabalawi, Bijan Davvaz

{"title":"AI evaluation of ChatGPT and human generated image/textual contents by bipolar generalized fuzzy hypergraph","authors":"Abbas Amini, Narjes Firouzkouhi, Wael Farag, Omar Ali, Isam Zabalawi, Bijan Davvaz","doi":"10.1007/s10462-024-11015-7","DOIUrl":null,"url":null,"abstract":"<div>Artificial Intelligence (AI) tools, i.e., ChatGPT (Chat Generative Pre-Trained Transformer), are positively and negatively revolutionizing the culture of industries, science, and education. The main objectives of this study are to address uncertainty and vagueness in ChatGPT systems, apply bipolarity as two-sided states of data, model generalized graph-based network with derivations, develop bipolar multi-dimensional fuzzy relation, advance entropy metrics for quantifying ambiguity, cluster entities based on level cuts, present pattern recognition in terms of statistical correlation coefficient, analyze speech recognition framework, and schedule online surgeries on the basis of blockchain technology. The outlined innovation pinpoints on the self-evaluation of ChatGPT systems, merging the bipolarity and generalized fuzzy hypergraph approach, developing the interpretation of graph-based patterns, and benchmarking the AI analysis and metrics advancement. To assess the efficiency of AI bipolar generalized fuzzy hypergraph (BGFH) model, the key conceptual benchmarks are clustering technique for detecting patterns and similar groups of data, statistical methods for the analysis of pattern recognition, and entropy metrics for quantifying the fuzziness within a system. This layout furnishes important characteristics such as union, intersection, complement, homomorphism, isomorphism, verifying the overlapping (intersection) and complement of two strong BGFHs as a strong BGFH. In addition, certain specifications of reflexive, symmetric, transitive, overlapping and integration, are defined using bipolar multi-dimensional fuzzy relation. Eleven classes are derived based on different values within \\(t\\in [0,1]\\) and \\(s\\in [-1,0],\\) classifying analogous data that aids the similarity detection of generated outputs. Through this approach, a new pattern recognition is used as a data evaluation technique to intelligently facilitate the process in terms of correlation coefficient. It is revealed that the highest magnitude of 0.145 is adopted for patterns \\(C_{1}\\) and D, indicating the most positive correlation between patterns, while patterns \\(C_{4}\\) and D with the value of \\(-0.35\\) are negatively correlated. The results verify that the entropy measure of visual data (0.75) is higher than the entropy measure of textual data with the value of 0.68, indicating more vagueness and ambiguity in visual generated systems. The corresponding textual data \\(E^{P}(U)\\) and \\(E^{N}(U)\\) are, respectively, calculated as 0.62 and 0.45 for human-created contents and ChatGPT-generated contents, whilst for visual data, the entropy measures \\(E^{P}(U)\\) and \\(E^{N}(U)\\) are, respectively, 0.25 and 0.66, showing the higher values for the entropy measure of ChatGPT-generated visual data compared to the ChatGPT-generated textual data. In relation to the speech recognition analysis, the highest human performance degree is affiliated to word “a” (0.89), while the lowest degree belongs to word “i” (0.81). The highest AI performance degree is allocated to word “it” \\((-0.7),\\) and the lowest degree is affiliated to word “the” \\((-0.88).\\) The overall entropy measure is calculated by 0.23, and the entropy measure of AI-based data is 0.35, on the other hand, the entropy measure of human-based data is equal to 0.29, representing higher vagueness for AI-based data. According to the obtained results in surgical case scheduling, the bipolar value of \\((0.9,-0.1)\\) is allocated to the surgeon who has the highest positive performance (0.9) and the lowest negative performance \\((-0.1);\\) this indicates the superior overall performance \\((\\lambda _{a}=0.8)\\) of the leader during the AI blockchain robotic colon surgery. The worst overall performance (0.22) is allotted to the surgeon, who is required to be removed from the surgery team by the leader physician. The outcomes are validated by a comparative analysis with respect to the classical bipolar fuzzy graph and bipolar fuzzy hypergraph, and NLP (natural language processing) approaches.</div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 3","pages":""},"PeriodicalIF":10.7000,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-11015-7.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-024-11015-7","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Artificial Intelligence (AI) tools, i.e., ChatGPT (Chat Generative Pre-Trained Transformer), are positively and negatively revolutionizing the culture of industries, science, and education. The main objectives of this study are to address uncertainty and vagueness in ChatGPT systems, apply bipolarity as two-sided states of data, model generalized graph-based network with derivations, develop bipolar multi-dimensional fuzzy relation, advance entropy metrics for quantifying ambiguity, cluster entities based on level cuts, present pattern recognition in terms of statistical correlation coefficient, analyze speech recognition framework, and schedule online surgeries on the basis of blockchain technology. The outlined innovation pinpoints on the self-evaluation of ChatGPT systems, merging the bipolarity and generalized fuzzy hypergraph approach, developing the interpretation of graph-based patterns, and benchmarking the AI analysis and metrics advancement. To assess the efficiency of AI bipolar generalized fuzzy hypergraph (BGFH) model, the key conceptual benchmarks are clustering technique for detecting patterns and similar groups of data, statistical methods for the analysis of pattern recognition, and entropy metrics for quantifying the fuzziness within a system. This layout furnishes important characteristics such as union, intersection, complement, homomorphism, isomorphism, verifying the overlapping (intersection) and complement of two strong BGFHs as a strong BGFH. In addition, certain specifications of reflexive, symmetric, transitive, overlapping and integration, are defined using bipolar multi-dimensional fuzzy relation. Eleven classes are derived based on different values within \(t\in [0,1]\) and \(s\in [-1,0],\) classifying analogous data that aids the similarity detection of generated outputs. Through this approach, a new pattern recognition is used as a data evaluation technique to intelligently facilitate the process in terms of correlation coefficient. It is revealed that the highest magnitude of 0.145 is adopted for patterns \(C_{1}\) and D, indicating the most positive correlation between patterns, while patterns \(C_{4}\) and D with the value of \(-0.35\) are negatively correlated. The results verify that the entropy measure of visual data (0.75) is higher than the entropy measure of textual data with the value of 0.68, indicating more vagueness and ambiguity in visual generated systems. The corresponding textual data \(E^{P}(U)\) and \(E^{N}(U)\) are, respectively, calculated as 0.62 and 0.45 for human-created contents and ChatGPT-generated contents, whilst for visual data, the entropy measures \(E^{P}(U)\) and \(E^{N}(U)\) are, respectively, 0.25 and 0.66, showing the higher values for the entropy measure of ChatGPT-generated visual data compared to the ChatGPT-generated textual data. In relation to the speech recognition analysis, the highest human performance degree is affiliated to word “a” (0.89), while the lowest degree belongs to word “i” (0.81). The highest AI performance degree is allocated to word “it” \((-0.7),\) and the lowest degree is affiliated to word “the” \((-0.88).\) The overall entropy measure is calculated by 0.23, and the entropy measure of AI-based data is 0.35, on the other hand, the entropy measure of human-based data is equal to 0.29, representing higher vagueness for AI-based data. According to the obtained results in surgical case scheduling, the bipolar value of \((0.9,-0.1)\) is allocated to the surgeon who has the highest positive performance (0.9) and the lowest negative performance \((-0.1);\) this indicates the superior overall performance \((\lambda _{a}=0.8)\) of the leader during the AI blockchain robotic colon surgery. The worst overall performance (0.22) is allotted to the surgeon, who is required to be removed from the surgery team by the leader physician. The outcomes are validated by a comparative analysis with respect to the classical bipolar fuzzy graph and bipolar fuzzy hypergraph, and NLP (natural language processing) approaches.

查看原文本刊更多论文

基于双极广义模糊超图的ChatGPT和人工生成图像/文本内容的人工智能评价

人工智能（AI）工具，即ChatGPT（聊天生成预训练转换器），正在积极和消极地改变工业、科学和教育的文化。本研究的主要目标是解决ChatGPT系统中的不确定性和模糊性问题，将双极性作为数据的双面状态，建立基于衍生的广义图网络模型，建立双极性多维模糊关系，提出量化模糊性的熵度量，基于水平切分的聚类实体，基于统计相关系数的模式识别，分析语音识别框架，并根据区块链技术安排在线手术。概述的创新重点是ChatGPT系统的自我评估，合并双极性和广义模糊超图方法，开发基于图的模式的解释，以及对人工智能分析和指标进步进行基准测试。为了评估人工智能双极广义模糊超图（BGFH）模型的效率，关键的概念基准是用于检测模式和类似数据组的聚类技术，用于分析模式识别的统计方法，以及用于量化系统内模糊性的熵度量。该布局具有并、交、补、同态、同构等重要特征，验证了两个强BGFH的重叠（交）和补是一个强BGFH。此外，利用双极多维模糊关系定义了自反性、对称性、传递性、重叠性和集成性的若干规范。根据\(t\in [0,1]\)和\(s\in [-1,0],\)中分类类似数据的不同值派生出11个类，这有助于对生成的输出进行相似性检测。通过这种方法，采用一种新的模式识别技术作为数据评估技术，以智能地促进相关系数的处理。结果表明，模式\(C_{1}\)和D的幅度最大，为0.145，表明模式之间的正相关程度最高，而模式\(C_{4}\)和D与\(-0.35\)的值呈负相关。结果表明，视觉数据的熵值（0.75）高于文本数据的熵值（0.68），表明视觉生成系统中存在更多的模糊和歧义。对于人工创建的内容和chatgpt生成的内容，对应的文本数据\(E^{P}(U)\)和\(E^{N}(U)\)分别计算为0.62和0.45，而对于视觉数据，其熵测度\(E^{P}(U)\)和\(E^{N}(U)\)分别为0.25和0.66，表明chatgpt生成的视觉数据的熵测度值高于chatgpt生成的文本数据。在语音识别分析中，人类表现程度最高的是单词“a”（0.89），最低的是单词“i”（0.81）。单词“it”的AI性能度最高\((-0.7),\)，单词“The”的AI性能度最低\((-0.88).\)总体熵测度为0.23，基于AI的数据的熵测度为0.35，而基于人的数据的熵测度为0.29，表明基于AI的数据具有较高的模糊性。根据在手术病例调度中获得的结果，将双极值\((0.9,-0.1)\)分配给阳性表现最高（0.9）和阴性表现最低\((-0.1);\)的外科医生，这表明在AI区块链机器人结肠手术中，领导者的整体表现更优\((\lambda _{a}=0.8)\)。总表现最差的是外科医生（0.22），由主治医生要求将其从手术小组中除名。通过与经典双极模糊图和双极模糊超图以及自然语言处理方法的比较分析，验证了结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Artificial Intelligence Review 工程技术-计算机：人工智能

CiteScore

22.00

自引率

3.30%

发文量

194

审稿时长

5.3 months

期刊介绍： Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.