Grace Riley, Elizabeth Wang, Camille Flynn, Ashley Lopez, Aparna Sridhar
{"title":"Evaluating the fidelity of AI-generated information on long-acting reversible contraceptive methods.","authors":"Grace Riley, Elizabeth Wang, Camille Flynn, Ashley Lopez, Aparna Sridhar","doi":"10.1080/13625187.2025.2450011","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Artificial intelligence (AI) has many applications in health care. Popular AI chatbots, such as ChatGPT, have the potential to make complex health topics more accessible to the general public. The study aims to assess the accuracy of current long-acting reversible contraception information provided by ChatGPT.</p><p><strong>Methods: </strong>We presented a set of 8 frequently-asked questions about long-acting reversible contraception (LARC) to ChatGPT, repeated over three distinct days. Each question was repeated with the LARC name changed (e.g., 'hormonal implant' vs 'Nexplanon') to account for variable terminology. Two coders independently assessed the AI-generated answers for accuracy, language inclusivity, and readability. Scores from the three duplicated sets were averaged.</p><p><strong>Results: </strong>A total of 264 responses were generated. 69.3% of responses were accurate. 16.3% of responses contained inaccurate information. The most common inaccuracy was outdated information regarding the duration of use of LARCs. 14.4% of responses included misleading statements based on conflicting evidence, such as claiming intrauterine devices increase one's risk for pelvic inflammatory disease. 45.1% of responses used gender-exclusive language and referred only to women. The average Flesch readability ease score was 42.8 (SD 7.1), correlating to a college reading level.</p><p><strong>Conclusion: </strong>ChatGPT offers important information about LARCs, though a minority of responses are found to be inaccurate or misleading. A significant limitation is AI's reliance on data from before October 2021. While AI tools can be a valuable resource for simple medical queries, users should be cautious of the potential for inaccurate information.</p><p><strong>Short condensation: </strong>ChatGPT generally provides accurate and adequate information about long-acting contraception. However, it occasionally makes false or misleading claims.</p>","PeriodicalId":50491,"journal":{"name":"European Journal of Contraception and Reproductive Health Care","volume":" ","pages":"1-4"},"PeriodicalIF":1.9000,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Contraception and Reproductive Health Care","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/13625187.2025.2450011","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"OBSTETRICS & GYNECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Artificial intelligence (AI) has many applications in health care. Popular AI chatbots, such as ChatGPT, have the potential to make complex health topics more accessible to the general public. The study aims to assess the accuracy of current long-acting reversible contraception information provided by ChatGPT.
Methods: We presented a set of 8 frequently-asked questions about long-acting reversible contraception (LARC) to ChatGPT, repeated over three distinct days. Each question was repeated with the LARC name changed (e.g., 'hormonal implant' vs 'Nexplanon') to account for variable terminology. Two coders independently assessed the AI-generated answers for accuracy, language inclusivity, and readability. Scores from the three duplicated sets were averaged.
Results: A total of 264 responses were generated. 69.3% of responses were accurate. 16.3% of responses contained inaccurate information. The most common inaccuracy was outdated information regarding the duration of use of LARCs. 14.4% of responses included misleading statements based on conflicting evidence, such as claiming intrauterine devices increase one's risk for pelvic inflammatory disease. 45.1% of responses used gender-exclusive language and referred only to women. The average Flesch readability ease score was 42.8 (SD 7.1), correlating to a college reading level.
Conclusion: ChatGPT offers important information about LARCs, though a minority of responses are found to be inaccurate or misleading. A significant limitation is AI's reliance on data from before October 2021. While AI tools can be a valuable resource for simple medical queries, users should be cautious of the potential for inaccurate information.
Short condensation: ChatGPT generally provides accurate and adequate information about long-acting contraception. However, it occasionally makes false or misleading claims.
期刊介绍:
The Official Journal of the European Society of Contraception and Reproductive Health, The European Journal of Contraception and Reproductive Health Care publishes original peer-reviewed research papers as well as review papers and other appropriate educational material.