Alejandro García-Rudolph, David Sanchez-Pinsach, Eloy Opisso
{"title":"评估人工智能模型:使用神经心理学中的正式多项选择题进行性能验证。","authors":"Alejandro García-Rudolph, David Sanchez-Pinsach, Eloy Opisso","doi":"10.1093/arclin/acae068","DOIUrl":null,"url":null,"abstract":"<p><p>High-quality and accessible education is crucial for advancing neuropsychology. A recent study identified key barriers to board certification in clinical neuropsychology, such as time constraints and insufficient specialized knowledge. To address these challenges, this study explored the capabilities of advanced Artificial Intelligence (AI) language models, GPT-3.5 (free-version) and GPT-4.0 (under-subscription version), by evaluating their performance on 300 American Board of Professional Psychology in Clinical Neuropsychology-like questions. The results indicate that GPT-4.0 achieved a higher accuracy rate of 80.0% compared to GPT-3.5's 65.7%. In the \"Assessment\" category, GPT-4.0 demonstrated a notable improvement with an accuracy rate of 73.4% compared to GPT-3.5's 58.6% (p = 0.012). The \"Assessment\" category, which comprised 128 questions and exhibited the highest error rate by both AI models, was analyzed. A thematic analysis of the 26 incorrectly answered questions revealed 8 main themes and 17 specific codes, highlighting significant gaps in areas such as \"Neurodegenerative Diseases\" and \"Neuropsychological Testing and Interpretation.\"</p>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating AI Models: Performance Validation Using Formal Multiple-Choice Questions in Neuropsychology.\",\"authors\":\"Alejandro García-Rudolph, David Sanchez-Pinsach, Eloy Opisso\",\"doi\":\"10.1093/arclin/acae068\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>High-quality and accessible education is crucial for advancing neuropsychology. A recent study identified key barriers to board certification in clinical neuropsychology, such as time constraints and insufficient specialized knowledge. To address these challenges, this study explored the capabilities of advanced Artificial Intelligence (AI) language models, GPT-3.5 (free-version) and GPT-4.0 (under-subscription version), by evaluating their performance on 300 American Board of Professional Psychology in Clinical Neuropsychology-like questions. The results indicate that GPT-4.0 achieved a higher accuracy rate of 80.0% compared to GPT-3.5's 65.7%. In the \\\"Assessment\\\" category, GPT-4.0 demonstrated a notable improvement with an accuracy rate of 73.4% compared to GPT-3.5's 58.6% (p = 0.012). The \\\"Assessment\\\" category, which comprised 128 questions and exhibited the highest error rate by both AI models, was analyzed. A thematic analysis of the 26 incorrectly answered questions revealed 8 main themes and 17 specific codes, highlighting significant gaps in areas such as \\\"Neurodegenerative Diseases\\\" and \\\"Neuropsychological Testing and Interpretation.\\\"</p>\",\"PeriodicalId\":2,\"journal\":{\"name\":\"ACS Applied Bio Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2024-09-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Bio Materials\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.1093/arclin/acae068\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATERIALS SCIENCE, BIOMATERIALS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1093/arclin/acae068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
Evaluating AI Models: Performance Validation Using Formal Multiple-Choice Questions in Neuropsychology.
High-quality and accessible education is crucial for advancing neuropsychology. A recent study identified key barriers to board certification in clinical neuropsychology, such as time constraints and insufficient specialized knowledge. To address these challenges, this study explored the capabilities of advanced Artificial Intelligence (AI) language models, GPT-3.5 (free-version) and GPT-4.0 (under-subscription version), by evaluating their performance on 300 American Board of Professional Psychology in Clinical Neuropsychology-like questions. The results indicate that GPT-4.0 achieved a higher accuracy rate of 80.0% compared to GPT-3.5's 65.7%. In the "Assessment" category, GPT-4.0 demonstrated a notable improvement with an accuracy rate of 73.4% compared to GPT-3.5's 58.6% (p = 0.012). The "Assessment" category, which comprised 128 questions and exhibited the highest error rate by both AI models, was analyzed. A thematic analysis of the 26 incorrectly answered questions revealed 8 main themes and 17 specific codes, highlighting significant gaps in areas such as "Neurodegenerative Diseases" and "Neuropsychological Testing and Interpretation."