Accuracy of artificial intelligence platforms on equine topics

IF 1.6 3区农林科学 Q2 VETERINARY SCIENCES

Journal of Equine Veterinary Science Pub Date : 2025-05-01 DOI:10.1016/j.jevs.2025.105506

S. Aldworth-Yang, S.J. Coleman, K. O'Reilly, D. Catalano

{"title":"Accuracy of artificial intelligence platforms on equine topics","authors":"S. Aldworth-Yang, S.J. Coleman, K. O'Reilly, D. Catalano","doi":"10.1016/j.jevs.2025.105506","DOIUrl":null,"url":null,"abstract":"<div><div>Artificial intelligence (AI) is becoming increasingly popular as a resource for information across all topics, including equine-related areas. However, AI models pull information from a variety of sources and do not always discern between fact and opinion. The objective of this study was to evaluate accuracy of AI-generated answers on equine topics from 3 AI platforms. Our hypothesis was that AI platforms could answer basic equine questions well but would not be able to accurately answer more complex questions or topics. The 3 AI platforms (P) evaluated were Chat GPT (CGPT), Microsoft Co-Pilot (MicCP), and Extension Bot (ExtBot). Researchers asked 40 questions on general horse care, facilities management, nutrition, genetics, and reproduction (topics; T). There were 4 levels (L): beginner (beg.), intermediate (int.), advanced (adv.), and “hot topics” (HT, areas of current interest in the industry). Answers were evaluated for accuracy, relevance, thoroughness, and source quality (10 points each, total score [TS] out of 40 points). Accuracy was determined by referencing textbooks and topic experts. Data were analyzed using PROC GLM in SAS (v. 9.4). Both CGPT and MicCP answered 40 of 40 questions, whereas ExtBot answered 33 of 40 questions. Total score was not affected by P (P = 0.197) or T (P = 0.536) but there was an effect of L (P = 0.002). Across platforms, beg. and int. questions had a higher TS compared with adv. or HT, indicating complexity of the topic plays a role in the quality of an answer. Accuracy was affected by P (P < 0.001), L (P < 0.001), and T (P = 0.015). Extension Bot had a lower score than both CGPT and MicCP. HT and Adv. had lower scores than beg. or int. questions. Reproduction had a lower score compared with all other topics. Relevance was affected by P (P = 0.042) and L (P < 0.001) but not T (P = 0.099). Chat GPT answers contained more irrelevant information compared with MicCP and ExtBot, which may indicate a weakness in parsing out only essential information. Answers to HT questions included less relevant information compared with int. answers. Thoroughness was affected by P (P < 0.001) and L (P = 0.002), but not T (P = 0.282). Chat GPT was the most thorough compared with MicCP and then ExtBot. Both beg. and int. answers were more thorough than HT or adv. answers. Source quality was affected by P (P = 0.037) but not L (P = 0.645) or T (P = 0.558), with ExtBot using higher quality sources compared with CGPT and MicCP. Overall, the AI programs struggled with complex topics and were inconsistent in their strengths. This research demonstrates that although AI tools may have potential as resources, they currently fall short of expertise and knowledge that can be offered by equine extension specialists.</div></div>","PeriodicalId":15798,"journal":{"name":"Journal of Equine Veterinary Science","volume":"148 ","pages":"Article 105506"},"PeriodicalIF":1.6000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Equine Veterinary Science","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0737080625001649","RegionNum":3,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"VETERINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Artificial intelligence (AI) is becoming increasingly popular as a resource for information across all topics, including equine-related areas. However, AI models pull information from a variety of sources and do not always discern between fact and opinion. The objective of this study was to evaluate accuracy of AI-generated answers on equine topics from 3 AI platforms. Our hypothesis was that AI platforms could answer basic equine questions well but would not be able to accurately answer more complex questions or topics. The 3 AI platforms (P) evaluated were Chat GPT (CGPT), Microsoft Co-Pilot (MicCP), and Extension Bot (ExtBot). Researchers asked 40 questions on general horse care, facilities management, nutrition, genetics, and reproduction (topics; T). There were 4 levels (L): beginner (beg.), intermediate (int.), advanced (adv.), and “hot topics” (HT, areas of current interest in the industry). Answers were evaluated for accuracy, relevance, thoroughness, and source quality (10 points each, total score [TS] out of 40 points). Accuracy was determined by referencing textbooks and topic experts. Data were analyzed using PROC GLM in SAS (v. 9.4). Both CGPT and MicCP answered 40 of 40 questions, whereas ExtBot answered 33 of 40 questions. Total score was not affected by P (P = 0.197) or T (P = 0.536) but there was an effect of L (P = 0.002). Across platforms, beg. and int. questions had a higher TS compared with adv. or HT, indicating complexity of the topic plays a role in the quality of an answer. Accuracy was affected by P (P < 0.001), L (P < 0.001), and T (P = 0.015). Extension Bot had a lower score than both CGPT and MicCP. HT and Adv. had lower scores than beg. or int. questions. Reproduction had a lower score compared with all other topics. Relevance was affected by P (P = 0.042) and L (P < 0.001) but not T (P = 0.099). Chat GPT answers contained more irrelevant information compared with MicCP and ExtBot, which may indicate a weakness in parsing out only essential information. Answers to HT questions included less relevant information compared with int. answers. Thoroughness was affected by P (P < 0.001) and L (P = 0.002), but not T (P = 0.282). Chat GPT was the most thorough compared with MicCP and then ExtBot. Both beg. and int. answers were more thorough than HT or adv. answers. Source quality was affected by P (P = 0.037) but not L (P = 0.645) or T (P = 0.558), with ExtBot using higher quality sources compared with CGPT and MicCP. Overall, the AI programs struggled with complex topics and were inconsistent in their strengths. This research demonstrates that although AI tools may have potential as resources, they currently fall short of expertise and knowledge that can be offered by equine extension specialists.

查看原文本刊更多论文

人工智能平台在马类话题上的准确性

人工智能（AI）作为包括马相关领域在内的所有主题的信息资源正变得越来越受欢迎。然而，人工智能模型从各种来源获取信息，并不总是能够区分事实和观点。本研究的目的是评估来自3个人工智能平台的关于马的话题的人工智能生成答案的准确性。我们的假设是，人工智能平台可以很好地回答基本的问题，但不能准确地回答更复杂的问题或话题。评估的3个人工智能平台(P)分别是Chat GPT （CGPT）、Microsoft Co-Pilot （MicCP）和Extension Bot （ExtBot）。研究人员询问了40个问题，涉及一般马匹护理、设施管理、营养、遗传和繁殖(主题；T)。有4个级别(L)：初学者（低级），中级（普通），高级（高级）和“热门话题”（HT，当前行业感兴趣的领域）。对答案的准确性、相关性、彻底性和来源质量进行评估（每项10分，总分[TS] / 40分）。准确性通过参考教科书和主题专家来确定。使用SAS （v. 9.4）中的PROC GLM分析数据。CGPT和MicCP都回答了40个问题中的40个，而ExtBot回答了40个问题中的33个。总评分不受P （P = 0.197）和T （P = 0.536）的影响，但有L （P = 0.002）的影响。跨平台，乞求。和int。与副词或HT相比，问题的TS更高，表明话题的复杂性对答案的质量起作用。P (P <；0.001), L (P <；0.001), T （P = 0.015）。扩展Bot的得分低于CGPT和MicCP。HT和ad得分低于beg。或int。的问题。与所有其他主题相比，生殖的得分较低。相关性受P （P = 0.042）和L (P <；0.001)，而不是T （P = 0.099）。与MicCP和ExtBot相比，Chat GPT答案包含更多不相关的信息，这可能表明在解析出基本信息方面存在弱点。与int相比，HT问题的答案包含的相关信息较少。的答案。彻底性受P (P <；0.001)和L (P = 0.002)，但T没有（P = 0.282）。与MicCP和ExtBot相比，Chat GPT是最彻底的。这两个请求。和int。答案比HT或ad的答案更彻底。源质量受P （P = 0.037）的影响，但不受L （P = 0.645）或T （P = 0.558）的影响，与CGPT和MicCP相比，ExtBot使用更高质量的源。总的来说，人工智能程序在复杂的主题上挣扎，而且它们的优势不一致。这项研究表明，尽管人工智能工具可能具有作为资源的潜力，但它们目前缺乏马推广专家可以提供的专业知识和知识。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Equine Veterinary Science 农林科学-兽医学

CiteScore

2.70

自引率

7.70%

发文量

249

审稿时长

77 days

期刊介绍： Journal of Equine Veterinary Science (JEVS) is an international publication designed for the practicing equine veterinarian, equine researcher, and other equine health care specialist. Published monthly, each issue of JEVS includes original research, reviews, case reports, short communications, and clinical techniques from leaders in the equine veterinary field, covering such topics as laminitis, reproduction, infectious disease, parasitology, behavior, podology, internal medicine, surgery and nutrition.