Shadae K. Beale MD , Natalie Cohen MD , Beatrice Secheli MD , Donald McIntire PhD , Kimberly A. Kho MD, MPH
{"title":"比较医生和人工智能聊天机器人对发布在公共社交媒体论坛上的乳房切除术后问题的回答","authors":"Shadae K. Beale MD , Natalie Cohen MD , Beatrice Secheli MD , Donald McIntire PhD , Kimberly A. Kho MD, MPH","doi":"10.1016/j.xagr.2025.100553","DOIUrl":null,"url":null,"abstract":"<div><h3>BACKGROUND</h3><div>Within public online forums, patients often seek reassurance and guidance from the community regarding postoperative symptoms and expectations, and when to seek medical assistance. Others are using artificial intelligence in the form of online search engines or chatbots such as ChatGPT or Perplexity. Artificial intelligence chatbot assistants have been growing in popularity; however, clinicians may be hesitant to use them because of concerns about accuracy. The online networking service for medical professionals, Doximity, has expanded its resources to include a Health Insurance Portability and Accountability Act–compliant artificial intelligence writing assistant, Doximity GPT, designed to reduce the administrative burden on clinicians. Health professionals learn using a “medical model,” which greatly differs from the “health belief model” that laypeople learn through. This mismatch in learning perspectives likely contributes to a communication mismatch even during digital clinician–patient encounters, especially in patients with limited health literacy during the perioperative period when complications may arise.</div></div><div><h3>OBJECTIVE</h3><div>This study aimed to evaluate the ability of artificial intelligence chatbot assistants (Doximity GPT, Perplexity, and ChatGPT) to generate quality, accurate, and empathetic responses to postoperative patient queries that are also understandable and actionable.</div></div><div><h3>STUDY DESIGN</h3><div>Responses to 10 postoperative queries sourced from HysterSisters, a public forum for “woman-to-woman hysterectomy support,” were generated using 3 artificial intelligence chatbot assistants (Doximity GPT, Perplexity, and ChatGPT) and a minimally invasive gynecologic surgery fellowship–trained surgeon. Ten physician evaluators compared the blinded responses for quality, accuracy, and empathy. A separate pair of physician evaluators scored the responses for understandability and actionability using the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P). The final scores were the average of both reviewers’ scores. Analysis of variance was used for pairwise comparison of the evaluator scores between sources. Lastly, the Kruskal–Wallis test was used to analyze Flesch–Kincaid scoring for readability. The Pearson chi-square test was used to demonstrate the difference in reading level among the responses for each source.</div></div><div><h3>RESULTS</h3><div>Compared with a physician, Doximity GPT and ChatGPT were rated as more empathetic than a minimally invasive gynecologic surgeon, but quality and accuracy were similar across these sources. There was a significant difference between Perplexity and the other response sources, favoring the latter, for quality and accuracy (<em>P</em><.001). Perplexity and the minimally invasive gynecologic surgeon ranked similarly for empathy. Reading ease was greater for the minimally invasive gynecologic surgeon responses (60.6 [53.5–68.4]; eighth and ninth grade) than for Perplexity (40.0 [28.6–47.2], college) and ChatGPT (35.5 [28.2–42.0], college) (<em>P</em><.01). There was no significant difference in understandability and actionability, with all sources scored as having good understandability and average actionability.</div></div><div><h3>CONCLUSION</h3><div>As artificial intelligence chatbot assistants grow in popularity, including integration in the electronic health record, the output’s readability must reflect the general population’s health literacy to be impactful and effective. This analysis serves as a reminder for physicians to be mindful of this mismatch in readability and general health literacy when considering the integration of artificial intelligence chatbot assistants into patient care. The accuracy and consistency of these chatbots may also impact patient outcomes, making screening of utmost importance in this endeavor.</div></div>","PeriodicalId":72141,"journal":{"name":"AJOG global reports","volume":"5 3","pages":"Article 100553"},"PeriodicalIF":0.0000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparing physician and artificial intelligence chatbot responses to posthysterectomy questions posted to a public social media forum\",\"authors\":\"Shadae K. Beale MD , Natalie Cohen MD , Beatrice Secheli MD , Donald McIntire PhD , Kimberly A. Kho MD, MPH\",\"doi\":\"10.1016/j.xagr.2025.100553\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>BACKGROUND</h3><div>Within public online forums, patients often seek reassurance and guidance from the community regarding postoperative symptoms and expectations, and when to seek medical assistance. Others are using artificial intelligence in the form of online search engines or chatbots such as ChatGPT or Perplexity. Artificial intelligence chatbot assistants have been growing in popularity; however, clinicians may be hesitant to use them because of concerns about accuracy. The online networking service for medical professionals, Doximity, has expanded its resources to include a Health Insurance Portability and Accountability Act–compliant artificial intelligence writing assistant, Doximity GPT, designed to reduce the administrative burden on clinicians. Health professionals learn using a “medical model,” which greatly differs from the “health belief model” that laypeople learn through. This mismatch in learning perspectives likely contributes to a communication mismatch even during digital clinician–patient encounters, especially in patients with limited health literacy during the perioperative period when complications may arise.</div></div><div><h3>OBJECTIVE</h3><div>This study aimed to evaluate the ability of artificial intelligence chatbot assistants (Doximity GPT, Perplexity, and ChatGPT) to generate quality, accurate, and empathetic responses to postoperative patient queries that are also understandable and actionable.</div></div><div><h3>STUDY DESIGN</h3><div>Responses to 10 postoperative queries sourced from HysterSisters, a public forum for “woman-to-woman hysterectomy support,” were generated using 3 artificial intelligence chatbot assistants (Doximity GPT, Perplexity, and ChatGPT) and a minimally invasive gynecologic surgery fellowship–trained surgeon. Ten physician evaluators compared the blinded responses for quality, accuracy, and empathy. A separate pair of physician evaluators scored the responses for understandability and actionability using the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P). The final scores were the average of both reviewers’ scores. Analysis of variance was used for pairwise comparison of the evaluator scores between sources. Lastly, the Kruskal–Wallis test was used to analyze Flesch–Kincaid scoring for readability. The Pearson chi-square test was used to demonstrate the difference in reading level among the responses for each source.</div></div><div><h3>RESULTS</h3><div>Compared with a physician, Doximity GPT and ChatGPT were rated as more empathetic than a minimally invasive gynecologic surgeon, but quality and accuracy were similar across these sources. There was a significant difference between Perplexity and the other response sources, favoring the latter, for quality and accuracy (<em>P</em><.001). Perplexity and the minimally invasive gynecologic surgeon ranked similarly for empathy. Reading ease was greater for the minimally invasive gynecologic surgeon responses (60.6 [53.5–68.4]; eighth and ninth grade) than for Perplexity (40.0 [28.6–47.2], college) and ChatGPT (35.5 [28.2–42.0], college) (<em>P</em><.01). There was no significant difference in understandability and actionability, with all sources scored as having good understandability and average actionability.</div></div><div><h3>CONCLUSION</h3><div>As artificial intelligence chatbot assistants grow in popularity, including integration in the electronic health record, the output’s readability must reflect the general population’s health literacy to be impactful and effective. This analysis serves as a reminder for physicians to be mindful of this mismatch in readability and general health literacy when considering the integration of artificial intelligence chatbot assistants into patient care. The accuracy and consistency of these chatbots may also impact patient outcomes, making screening of utmost importance in this endeavor.</div></div>\",\"PeriodicalId\":72141,\"journal\":{\"name\":\"AJOG global reports\",\"volume\":\"5 3\",\"pages\":\"Article 100553\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AJOG global reports\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666577825001145\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AJOG global reports","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666577825001145","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparing physician and artificial intelligence chatbot responses to posthysterectomy questions posted to a public social media forum
BACKGROUND
Within public online forums, patients often seek reassurance and guidance from the community regarding postoperative symptoms and expectations, and when to seek medical assistance. Others are using artificial intelligence in the form of online search engines or chatbots such as ChatGPT or Perplexity. Artificial intelligence chatbot assistants have been growing in popularity; however, clinicians may be hesitant to use them because of concerns about accuracy. The online networking service for medical professionals, Doximity, has expanded its resources to include a Health Insurance Portability and Accountability Act–compliant artificial intelligence writing assistant, Doximity GPT, designed to reduce the administrative burden on clinicians. Health professionals learn using a “medical model,” which greatly differs from the “health belief model” that laypeople learn through. This mismatch in learning perspectives likely contributes to a communication mismatch even during digital clinician–patient encounters, especially in patients with limited health literacy during the perioperative period when complications may arise.
OBJECTIVE
This study aimed to evaluate the ability of artificial intelligence chatbot assistants (Doximity GPT, Perplexity, and ChatGPT) to generate quality, accurate, and empathetic responses to postoperative patient queries that are also understandable and actionable.
STUDY DESIGN
Responses to 10 postoperative queries sourced from HysterSisters, a public forum for “woman-to-woman hysterectomy support,” were generated using 3 artificial intelligence chatbot assistants (Doximity GPT, Perplexity, and ChatGPT) and a minimally invasive gynecologic surgery fellowship–trained surgeon. Ten physician evaluators compared the blinded responses for quality, accuracy, and empathy. A separate pair of physician evaluators scored the responses for understandability and actionability using the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P). The final scores were the average of both reviewers’ scores. Analysis of variance was used for pairwise comparison of the evaluator scores between sources. Lastly, the Kruskal–Wallis test was used to analyze Flesch–Kincaid scoring for readability. The Pearson chi-square test was used to demonstrate the difference in reading level among the responses for each source.
RESULTS
Compared with a physician, Doximity GPT and ChatGPT were rated as more empathetic than a minimally invasive gynecologic surgeon, but quality and accuracy were similar across these sources. There was a significant difference between Perplexity and the other response sources, favoring the latter, for quality and accuracy (P<.001). Perplexity and the minimally invasive gynecologic surgeon ranked similarly for empathy. Reading ease was greater for the minimally invasive gynecologic surgeon responses (60.6 [53.5–68.4]; eighth and ninth grade) than for Perplexity (40.0 [28.6–47.2], college) and ChatGPT (35.5 [28.2–42.0], college) (P<.01). There was no significant difference in understandability and actionability, with all sources scored as having good understandability and average actionability.
CONCLUSION
As artificial intelligence chatbot assistants grow in popularity, including integration in the electronic health record, the output’s readability must reflect the general population’s health literacy to be impactful and effective. This analysis serves as a reminder for physicians to be mindful of this mismatch in readability and general health literacy when considering the integration of artificial intelligence chatbot assistants into patient care. The accuracy and consistency of these chatbots may also impact patient outcomes, making screening of utmost importance in this endeavor.
AJOG global reportsEndocrinology, Diabetes and Metabolism, Obstetrics, Gynecology and Women's Health, Perinatology, Pediatrics and Child Health, Urology