Muhammad Anees, Fareed Ahmed Shaikh, Hafsah Shaikh, Nadeem Ahmed Siddiqui, Zia Ur Rehman
{"title":"评估 ChatGPT 对静脉曲张射频消融相关问题的答复质量。","authors":"Muhammad Anees, Fareed Ahmed Shaikh, Hafsah Shaikh, Nadeem Ahmed Siddiqui, Zia Ur Rehman","doi":"10.1016/j.jvsv.2024.101985","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>This study aimed to evaluate the accuracy and reproducibility of information provided by ChatGPT, in response to frequently asked questions about radiofrequency ablation (RFA) for varicose veins.</p><p><strong>Methods: </strong>This cross-sectional study was conducted at The Aga Khan University Hospital, Karachi, Pakistan. A set of 18 frequently asked questions regarding RFA for varicose veins were compiled from credible online sources and presented to ChatGPT twice, separately, using the new chat option. Twelve experienced vascular surgeons (with >2 years of experience and ≥20 RFA procedures performed annually) independently evaluated the accuracy of the responses using a 4-point Likert scale and assessed their reproducibility.</p><p><strong>Results: </strong>Most evaluators were males (n = 10/12 [83.3%]) with an average of 12.3 ± 6.2 years of experience as a vascular surgeon. Six evaluators (50%) were from the UK followed by three from Saudi Arabia (25.0%), two from Pakistan (16.7%), and one from the United States (8.3%). Among the 216 accuracy grades, most of the evaluators graded the responses as comprehensive (n = 87/216 [40.3%]) or accurate but insufficient (n = 70/216 [32.4%]), whereas only 17.1% (n = 37/216) were graded as a mixture of both accurate and inaccurate information and 10.8% (n = 22/216) as entirely inaccurate. Overall, 89.8% of the responses (n = 194/216) were deemed reproducible. Of the total responses, 70.4% (n = 152/216) were classified as good quality and reproducible. The remaining responses were poor quality with 19.4% reproducible (n = 42/216) and 10.2% nonreproducible (n = 22/216). There was nonsignificant inter-rater disagreement among the vascular surgeons for overall responses (Fleiss' kappa, -0.028; P = .131).</p><p><strong>Conclusions: </strong>ChatGPT provided generally accurate and reproducible information on RFA for varicose veins; however, variability in response quality and limited inter-rater reliability highlight the need for further improvements. Although it has the potential to enhance patient education and support healthcare decision-making, improvements in its training, validation, transparency, and mechanisms to address inaccurate or incomplete information are essential.</p>","PeriodicalId":17537,"journal":{"name":"Journal of vascular surgery. Venous and lymphatic disorders","volume":null,"pages":null},"PeriodicalIF":2.8000,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing the quality of ChatGPT's responses to questions related to radiofrequency ablation for varicose veins.\",\"authors\":\"Muhammad Anees, Fareed Ahmed Shaikh, Hafsah Shaikh, Nadeem Ahmed Siddiqui, Zia Ur Rehman\",\"doi\":\"10.1016/j.jvsv.2024.101985\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>This study aimed to evaluate the accuracy and reproducibility of information provided by ChatGPT, in response to frequently asked questions about radiofrequency ablation (RFA) for varicose veins.</p><p><strong>Methods: </strong>This cross-sectional study was conducted at The Aga Khan University Hospital, Karachi, Pakistan. A set of 18 frequently asked questions regarding RFA for varicose veins were compiled from credible online sources and presented to ChatGPT twice, separately, using the new chat option. Twelve experienced vascular surgeons (with >2 years of experience and ≥20 RFA procedures performed annually) independently evaluated the accuracy of the responses using a 4-point Likert scale and assessed their reproducibility.</p><p><strong>Results: </strong>Most evaluators were males (n = 10/12 [83.3%]) with an average of 12.3 ± 6.2 years of experience as a vascular surgeon. Six evaluators (50%) were from the UK followed by three from Saudi Arabia (25.0%), two from Pakistan (16.7%), and one from the United States (8.3%). Among the 216 accuracy grades, most of the evaluators graded the responses as comprehensive (n = 87/216 [40.3%]) or accurate but insufficient (n = 70/216 [32.4%]), whereas only 17.1% (n = 37/216) were graded as a mixture of both accurate and inaccurate information and 10.8% (n = 22/216) as entirely inaccurate. Overall, 89.8% of the responses (n = 194/216) were deemed reproducible. Of the total responses, 70.4% (n = 152/216) were classified as good quality and reproducible. The remaining responses were poor quality with 19.4% reproducible (n = 42/216) and 10.2% nonreproducible (n = 22/216). There was nonsignificant inter-rater disagreement among the vascular surgeons for overall responses (Fleiss' kappa, -0.028; P = .131).</p><p><strong>Conclusions: </strong>ChatGPT provided generally accurate and reproducible information on RFA for varicose veins; however, variability in response quality and limited inter-rater reliability highlight the need for further improvements. Although it has the potential to enhance patient education and support healthcare decision-making, improvements in its training, validation, transparency, and mechanisms to address inaccurate or incomplete information are essential.</p>\",\"PeriodicalId\":17537,\"journal\":{\"name\":\"Journal of vascular surgery. Venous and lymphatic disorders\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of vascular surgery. Venous and lymphatic disorders\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jvsv.2024.101985\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PERIPHERAL VASCULAR DISEASE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of vascular surgery. Venous and lymphatic disorders","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jvsv.2024.101985","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PERIPHERAL VASCULAR DISEASE","Score":null,"Total":0}
Assessing the quality of ChatGPT's responses to questions related to radiofrequency ablation for varicose veins.
Objective: This study aimed to evaluate the accuracy and reproducibility of information provided by ChatGPT, in response to frequently asked questions about radiofrequency ablation (RFA) for varicose veins.
Methods: This cross-sectional study was conducted at The Aga Khan University Hospital, Karachi, Pakistan. A set of 18 frequently asked questions regarding RFA for varicose veins were compiled from credible online sources and presented to ChatGPT twice, separately, using the new chat option. Twelve experienced vascular surgeons (with >2 years of experience and ≥20 RFA procedures performed annually) independently evaluated the accuracy of the responses using a 4-point Likert scale and assessed their reproducibility.
Results: Most evaluators were males (n = 10/12 [83.3%]) with an average of 12.3 ± 6.2 years of experience as a vascular surgeon. Six evaluators (50%) were from the UK followed by three from Saudi Arabia (25.0%), two from Pakistan (16.7%), and one from the United States (8.3%). Among the 216 accuracy grades, most of the evaluators graded the responses as comprehensive (n = 87/216 [40.3%]) or accurate but insufficient (n = 70/216 [32.4%]), whereas only 17.1% (n = 37/216) were graded as a mixture of both accurate and inaccurate information and 10.8% (n = 22/216) as entirely inaccurate. Overall, 89.8% of the responses (n = 194/216) were deemed reproducible. Of the total responses, 70.4% (n = 152/216) were classified as good quality and reproducible. The remaining responses were poor quality with 19.4% reproducible (n = 42/216) and 10.2% nonreproducible (n = 22/216). There was nonsignificant inter-rater disagreement among the vascular surgeons for overall responses (Fleiss' kappa, -0.028; P = .131).
Conclusions: ChatGPT provided generally accurate and reproducible information on RFA for varicose veins; however, variability in response quality and limited inter-rater reliability highlight the need for further improvements. Although it has the potential to enhance patient education and support healthcare decision-making, improvements in its training, validation, transparency, and mechanisms to address inaccurate or incomplete information are essential.
期刊介绍:
Journal of Vascular Surgery: Venous and Lymphatic Disorders is one of a series of specialist journals launched by the Journal of Vascular Surgery. It aims to be the premier international Journal of medical, endovascular and surgical management of venous and lymphatic disorders. It publishes high quality clinical, research, case reports, techniques, and practice manuscripts related to all aspects of venous and lymphatic disorders, including malformations and wound care, with an emphasis on the practicing clinician. The journal seeks to provide novel and timely information to vascular surgeons, interventionalists, phlebologists, wound care specialists, and allied health professionals who treat patients presenting with vascular and lymphatic disorders. As the official publication of The Society for Vascular Surgery and the American Venous Forum, the Journal will publish, after peer review, selected papers presented at the annual meeting of these organizations and affiliated vascular societies, as well as original articles from members and non-members.