{"title":"Comparative Efficacy of AI LLMs in Clinical Social Work: ChatGPT-4, Gemini, Copilot","authors":"Hacer Taşkıran Tepe, Hüsnünur Aslantürk","doi":"10.1177/10497315241313071","DOIUrl":null,"url":null,"abstract":"PurposeThis study examines the comparative efficacy of three AI large language models (LLMs)—ChatGPT-4, Gemini, and Microsoft Copilot—in clinical social work.MethodBy presenting scenarios of varying complexities, the study assessed their performance using the Ateşman Readability Index and a Likert-type accuracy scale.ResultsResults showed that Gemini had the highest accuracy, while Microsoft Copilot excelled in readability. Significant differences were found in accuracy scores ( p = .003), although readability differences were not statistically significant ( p = .054). No correlation was found between case complexity and either accuracy or readability.DiscussionDespite the differences, none of the models fully met all accuracy standards, indicating areas for further improvement. The findings suggest that while LLMs offer promise in social work, they require refinement to better meet the field's needs.","PeriodicalId":47993,"journal":{"name":"Research on Social Work Practice","volume":"144 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research on Social Work Practice","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1177/10497315241313071","RegionNum":4,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}
引用次数: 0
Abstract
PurposeThis study examines the comparative efficacy of three AI large language models (LLMs)—ChatGPT-4, Gemini, and Microsoft Copilot—in clinical social work.MethodBy presenting scenarios of varying complexities, the study assessed their performance using the Ateşman Readability Index and a Likert-type accuracy scale.ResultsResults showed that Gemini had the highest accuracy, while Microsoft Copilot excelled in readability. Significant differences were found in accuracy scores ( p = .003), although readability differences were not statistically significant ( p = .054). No correlation was found between case complexity and either accuracy or readability.DiscussionDespite the differences, none of the models fully met all accuracy standards, indicating areas for further improvement. The findings suggest that while LLMs offer promise in social work, they require refinement to better meet the field's needs.
期刊介绍:
Research on Social Work Practice, sponsored by the Society for Social Work and Research, is a disciplinary journal devoted to the publication of empirical research concerning the methods and outcomes of social work practice. Social work practice is broadly interpreted to refer to the application of intentionally designed social work intervention programs to problems of societal and/or interpersonal importance, including behavior analysis or psychotherapy involving individuals; case management; practice involving couples, families, and small groups; community practice education; and the development, implementation, and evaluation of social policies.