Da-Eun Yoon, Cheol-Han Kim, Yeonhee Ryu, Ye-Seul Lee, Younbyoung Chae
{"title":"Performance of GPT-4 for planning acupuncture treatment: comparison with human clinician performance.","authors":"Da-Eun Yoon, Cheol-Han Kim, Yeonhee Ryu, Ye-Seul Lee, Younbyoung Chae","doi":"10.3389/fmed.2025.1632303","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The medical knowledge of GPT-4 has been evaluated on patient data, providing diagnostic and treatment suggestions. However, few studies have directly compared the clinical suggestions of GPT-4 with those of groups of practitioners.</p><p><strong>Methods: </strong>This study assessed the ability of GPT-4 to make medical decisions regarding acupuncture treatment by comparing its selection of acupoints with those made by human clinicians. Ten case reports published in Korean medical journals were selected and put in a standardized format. The standardized patient information was given to 80 Korean Medicine doctors and GPT-4 to diagnose and prescribe three to five acupoints per case. To evaluate the performance of GPT-4, the similarities in acupoint selection between the doctors and GPT-4 were quantified based on the percentage overlap and correlations of the selection probabilities of acupoints in each case.</p><p><strong>Results: </strong>The average percentage overlap for acupoints among cases at the 10% cutoff was 51.3%, i.e., more than half of the GPT-4 acupoint suggestions overlapped the acupoints selected by the doctors. In half of the cases, significant correlations were observed in the acupoint selection probabilities, implying that GPT-4 acupoint suggestions are similar to those of doctors.</p><p><strong>Conclusions: </strong>GPT-4 made reasonable acupoint suggestions, with notable overlap observed with the prescriptions of doctors. This shows its promise for supporting medical decisions, education, and personalized medicine for patients undergoing acupuncture treatment. Future studies and validation are necessary to ensure the reliability and efficacy of applying GPT-4 in real-world settings.</p>","PeriodicalId":12488,"journal":{"name":"Frontiers in Medicine","volume":"12 ","pages":"1632303"},"PeriodicalIF":3.1000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12504477/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fmed.2025.1632303","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The medical knowledge of GPT-4 has been evaluated on patient data, providing diagnostic and treatment suggestions. However, few studies have directly compared the clinical suggestions of GPT-4 with those of groups of practitioners.
Methods: This study assessed the ability of GPT-4 to make medical decisions regarding acupuncture treatment by comparing its selection of acupoints with those made by human clinicians. Ten case reports published in Korean medical journals were selected and put in a standardized format. The standardized patient information was given to 80 Korean Medicine doctors and GPT-4 to diagnose and prescribe three to five acupoints per case. To evaluate the performance of GPT-4, the similarities in acupoint selection between the doctors and GPT-4 were quantified based on the percentage overlap and correlations of the selection probabilities of acupoints in each case.
Results: The average percentage overlap for acupoints among cases at the 10% cutoff was 51.3%, i.e., more than half of the GPT-4 acupoint suggestions overlapped the acupoints selected by the doctors. In half of the cases, significant correlations were observed in the acupoint selection probabilities, implying that GPT-4 acupoint suggestions are similar to those of doctors.
Conclusions: GPT-4 made reasonable acupoint suggestions, with notable overlap observed with the prescriptions of doctors. This shows its promise for supporting medical decisions, education, and personalized medicine for patients undergoing acupuncture treatment. Future studies and validation are necessary to ensure the reliability and efficacy of applying GPT-4 in real-world settings.
期刊介绍:
Frontiers in Medicine publishes rigorously peer-reviewed research linking basic research to clinical practice and patient care, as well as translating scientific advances into new therapies and diagnostic tools. Led by an outstanding Editorial Board of international experts, this multidisciplinary open-access journal is at the forefront of disseminating and communicating scientific knowledge and impactful discoveries to researchers, academics, clinicians and the public worldwide.
In addition to papers that provide a link between basic research and clinical practice, a particular emphasis is given to studies that are directly relevant to patient care. In this spirit, the journal publishes the latest research results and medical knowledge that facilitate the translation of scientific advances into new therapies or diagnostic tools. The full listing of the Specialty Sections represented by Frontiers in Medicine is as listed below. As well as the established medical disciplines, Frontiers in Medicine is launching new sections that together will facilitate
- the use of patient-reported outcomes under real world conditions
- the exploitation of big data and the use of novel information and communication tools in the assessment of new medicines
- the scientific bases for guidelines and decisions from regulatory authorities
- access to medicinal products and medical devices worldwide
- addressing the grand health challenges around the world