Evaluating the fidelity of AI-generated information on long-acting reversible contraceptive methods.

IF 2 4区医学 Q3 OBSTETRICS & GYNECOLOGY

European Journal of Contraception and Reproductive Health Care Pub Date : 2025-04-01 Epub Date: 2025-02-06 DOI:10.1080/13625187.2025.2450011

Grace Riley, Elizabeth Wang, Camille Flynn, Ashley Lopez, Aparna Sridhar

{"title":"Evaluating the fidelity of AI-generated information on long-acting reversible contraceptive methods.","authors":"Grace Riley, Elizabeth Wang, Camille Flynn, Ashley Lopez, Aparna Sridhar","doi":"10.1080/13625187.2025.2450011","DOIUrl":null,"url":null,"abstract":"Introduction: Artificial intelligence (AI) has many applications in health care. Popular AI chatbots, such as ChatGPT, have the potential to make complex health topics more accessible to the general public. The study aims to assess the accuracy of current long-acting reversible contraception information provided by ChatGPT.Methods: We presented a set of 8 frequently-asked questions about long-acting reversible contraception (LARC) to ChatGPT, repeated over three distinct days. Each question was repeated with the LARC name changed (e.g., 'hormonal implant' vs 'Nexplanon') to account for variable terminology. Two coders independently assessed the AI-generated answers for accuracy, language inclusivity, and readability. Scores from the three duplicated sets were averaged.Results: A total of 264 responses were generated. 69.3% of responses were accurate. 16.3% of responses contained inaccurate information. The most common inaccuracy was outdated information regarding the duration of use of LARCs. 14.4% of responses included misleading statements based on conflicting evidence, such as claiming intrauterine devices increase one's risk for pelvic inflammatory disease. 45.1% of responses used gender-exclusive language and referred only to women. The average Flesch readability ease score was 42.8 (SD 7.1), correlating to a college reading level.Conclusion: ChatGPT offers important information about LARCs, though a minority of responses are found to be inaccurate or misleading. A significant limitation is AI's reliance on data from before October 2021. While AI tools can be a valuable resource for simple medical queries, users should be cautious of the potential for inaccurate information.Short condensation: ChatGPT generally provides accurate and adequate information about long-acting contraception. However, it occasionally makes false or misleading claims.","PeriodicalId":50491,"journal":{"name":"European Journal of Contraception and Reproductive Health Care","volume":" ","pages":"74-77"},"PeriodicalIF":2.0000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Contraception and Reproductive Health Care","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/13625187.2025.2450011","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/6 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"OBSTETRICS & GYNECOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: Artificial intelligence (AI) has many applications in health care. Popular AI chatbots, such as ChatGPT, have the potential to make complex health topics more accessible to the general public. The study aims to assess the accuracy of current long-acting reversible contraception information provided by ChatGPT.

Methods: We presented a set of 8 frequently-asked questions about long-acting reversible contraception (LARC) to ChatGPT, repeated over three distinct days. Each question was repeated with the LARC name changed (e.g., 'hormonal implant' vs 'Nexplanon') to account for variable terminology. Two coders independently assessed the AI-generated answers for accuracy, language inclusivity, and readability. Scores from the three duplicated sets were averaged.

Results: A total of 264 responses were generated. 69.3% of responses were accurate. 16.3% of responses contained inaccurate information. The most common inaccuracy was outdated information regarding the duration of use of LARCs. 14.4% of responses included misleading statements based on conflicting evidence, such as claiming intrauterine devices increase one's risk for pelvic inflammatory disease. 45.1% of responses used gender-exclusive language and referred only to women. The average Flesch readability ease score was 42.8 (SD 7.1), correlating to a college reading level.

Conclusion: ChatGPT offers important information about LARCs, though a minority of responses are found to be inaccurate or misleading. A significant limitation is AI's reliance on data from before October 2021. While AI tools can be a valuable resource for simple medical queries, users should be cautious of the potential for inaccurate information.

Short condensation: ChatGPT generally provides accurate and adequate information about long-acting contraception. However, it occasionally makes false or misleading claims.

查看原文本刊更多论文

评估人工智能生成的长效可逆避孕方法信息的保真度。

导读：人工智能（AI）在医疗保健领域有很多应用。流行的人工智能聊天机器人，如ChatGPT，有可能使复杂的健康话题更容易为公众所接受。本研究旨在评估ChatGPT提供的当前长效可逆避孕信息的准确性。方法：我们向ChatGPT提出了一组8个关于长效可逆避孕（LARC）的常见问题，重复3天。每个问题都重复了，LARC的名称也改变了（例如，“激素植入物”vs“Nexplanon”），以解释术语的变化。两名程序员独立评估了人工智能生成的答案的准确性、语言包容性和可读性。三组重复的分数取平均值。结果：共获得264份问卷。正确率为69.3%。16.3%的回复信息不准确。最常见的不准确是关于LARCs使用时间的过时信息。14.4%的答复包括基于相互矛盾的证据的误导性陈述，例如声称宫内节育器会增加患盆腔炎的风险。45.1%的答复使用性别排斥的语言，只提到女性。Flesch的平均易读性得分为42.8 (SD 7.1)，与大学阅读水平相关。结论：ChatGPT提供了关于LARCs的重要信息，尽管少数回复被发现是不准确或误导性的。一个重要的限制是人工智能依赖于2021年10月之前的数据。虽然人工智能工具可以成为简单医疗查询的宝贵资源，但用户应警惕信息不准确的可能性。短凝：ChatGPT一般提供准确和充分的信息，长效避孕。然而，它偶尔会做出虚假或误导性的声明。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

European Journal of Contraception and Reproductive Health Care 医学-妇产科学

CiteScore

3.70

自引率

11.80%

发文量

审稿时长

>12 weeks

期刊介绍： The Official Journal of the European Society of Contraception and Reproductive Health, The European Journal of Contraception and Reproductive Health Care publishes original peer-reviewed research papers as well as review papers and other appropriate educational material.