Gregory Vurture, Nicole Jenkins, James Ross, Stephanie Sansone, Ellen Conner, Nina Jacobson, Scott Smilen, Jonathan Baum
{"title":"Addressing Commonly Asked Questions in Urogynecology: Accuracy and Limitations of ChatGPT.","authors":"Gregory Vurture, Nicole Jenkins, James Ross, Stephanie Sansone, Ellen Conner, Nina Jacobson, Scott Smilen, Jonathan Baum","doi":"10.1007/s00192-025-06184-0","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction and hypothesis: </strong>Existing literature suggests that large language models such as Chat Generative Pre-training Transformer (ChatGPT) might provide inaccurate and unreliable health care information. The literature regarding its performance in urogynecology is scarce. The aim of the present study is to assess ChatGPT's ability to accurately answer commonly asked urogynecology patient questions.</p><p><strong>Methods: </strong>An expert panel of five board certified urogynecologists and two fellows developed ten commonly asked patient questions in a urogynecology office. Questions were phrased using diction and verbiage that a patient may use when asking a question over the internet. ChatGPT responses were evaluated using the Brief DISCERN (BD) tool, a validated scoring system for online health care information. Scores ≥ 16 are consistent with good-quality content. Responses were graded based on their accuracy and consistency with expert opinion and published guidelines.</p><p><strong>Results: </strong>The average score across all ten questions was 18.9 ± 2.7. Nine out of ten (90%) questions had a response that was determined to be of good quality (BD ≥ 16). The lowest scoring topic was \"Pelvic Organ Prolapse\" (mean BD = 14.0 ± 2.0). The highest scoring topic was \"Interstitial Cystitis\" (mean BD = 22.0 ± 0). ChatGPT provided no references for its responses.</p><p><strong>Conclusions: </strong>ChatGPT provided high-quality responses to 90% of the questions based on an expert panel's review with the BD tool. Nonetheless, given the evolving nature of this technology, continued analysis is crucial before ChatGPT can be accepted as accurate and reliable.</p>","PeriodicalId":14355,"journal":{"name":"International Urogynecology Journal","volume":" ","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Urogynecology Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00192-025-06184-0","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"OBSTETRICS & GYNECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction and hypothesis: Existing literature suggests that large language models such as Chat Generative Pre-training Transformer (ChatGPT) might provide inaccurate and unreliable health care information. The literature regarding its performance in urogynecology is scarce. The aim of the present study is to assess ChatGPT's ability to accurately answer commonly asked urogynecology patient questions.
Methods: An expert panel of five board certified urogynecologists and two fellows developed ten commonly asked patient questions in a urogynecology office. Questions were phrased using diction and verbiage that a patient may use when asking a question over the internet. ChatGPT responses were evaluated using the Brief DISCERN (BD) tool, a validated scoring system for online health care information. Scores ≥ 16 are consistent with good-quality content. Responses were graded based on their accuracy and consistency with expert opinion and published guidelines.
Results: The average score across all ten questions was 18.9 ± 2.7. Nine out of ten (90%) questions had a response that was determined to be of good quality (BD ≥ 16). The lowest scoring topic was "Pelvic Organ Prolapse" (mean BD = 14.0 ± 2.0). The highest scoring topic was "Interstitial Cystitis" (mean BD = 22.0 ± 0). ChatGPT provided no references for its responses.
Conclusions: ChatGPT provided high-quality responses to 90% of the questions based on an expert panel's review with the BD tool. Nonetheless, given the evolving nature of this technology, continued analysis is crucial before ChatGPT can be accepted as accurate and reliable.
期刊介绍:
The International Urogynecology Journal is the official journal of the International Urogynecological Association (IUGA).The International Urogynecology Journal has evolved in response to a perceived need amongst the clinicians, scientists, and researchers active in the field of urogynecology and pelvic floor disorders. Gynecologists, urologists, physiotherapists, nurses and basic scientists require regular means of communication within this field of pelvic floor dysfunction to express new ideas and research, and to review clinical practice in the diagnosis and treatment of women with disorders of the pelvic floor. This Journal has adopted the peer review process for all original contributions and will maintain high standards with regard to the research published therein. The clinical approach to urogynecology and pelvic floor disorders will be emphasized with each issue containing clinically relevant material that will be immediately applicable for clinical medicine. This publication covers all aspects of the field in an interdisciplinary fashion