Angelo Cadiente, Catherine Implicito, Abinav Udaiyar, Andre Ho, Christopher Wan, Jamie Chen, Charles Palmer, Qilin Cao, Michael Raver, Katerina Lembrikova, Mubashir Billah
{"title":"摘要:人工智能生成与Cochrane综述。","authors":"Angelo Cadiente, Catherine Implicito, Abinav Udaiyar, Andre Ho, Christopher Wan, Jamie Chen, Charles Palmer, Qilin Cao, Michael Raver, Katerina Lembrikova, Mubashir Billah","doi":"10.1097/SPV.0000000000001688","DOIUrl":null,"url":null,"abstract":"<p><strong>Importance: </strong>As the volume of medical literature continues to expand, the provision of artificial intelligence (AI) to produce concise, accessible summaries has the potential to enhance the efficacy of content review.</p><p><strong>Objectives: </strong>This project assessed the readability and quality of summaries generated by ChatGPT in comparison to the Plain Text Summaries from Cochrane Review, a systematic review database, in incontinence research.</p><p><strong>Study design: </strong>Seventy-three abstracts from the Cochrane Library tagged under \"Incontinence\" were summarized using ChatGPT-3.5 (July 2023 Version) and compared with their corresponding Cochrane Plain Text Summaries. Readability was assessed using Flesch Kincaid Reading Ease, Flesch Kincaid Grade Level, Gunning Fog Score, Smog Index, Coleman Liau Index, and Automated Readability Index. A 2-tailed t test was used to compare the summaries. Each summary was also evaluated by 2 blinded, independent reviewers on a 5-point scale where higher scores indicated greater accuracy and adherence to the abstract.</p><p><strong>Results: </strong>Compared to ChatGPT, Cochrane Review's Plain Text Summaries scored higher in the numerical Flesch Kincaid Reading Ease score and showed lower necessary education levels in the 5 other readability metrics with statistical significance, indicating better readability. However, ChatGPT earned a higher mean accuracy grade of 4.25 compared to Cochrane Review's mean grade of 4.05 with statistical significance.</p><p><strong>Conclusions: </strong>Cochrane Review's Plain Text Summaries provide clearer summaries of the incontinence literature when compared to ChatGPT, yet ChatGPT generated more comprehensive summaries. While ChatGPT can effectively summarize the medical literature, further studies can improve reader accessibility to these summaries.</p>","PeriodicalId":75288,"journal":{"name":"Urogynecology (Hagerstown, Md.)","volume":" ","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating Incontinence Abstracts: Artificial Intelligence-Generated Versus Cochrane Review.\",\"authors\":\"Angelo Cadiente, Catherine Implicito, Abinav Udaiyar, Andre Ho, Christopher Wan, Jamie Chen, Charles Palmer, Qilin Cao, Michael Raver, Katerina Lembrikova, Mubashir Billah\",\"doi\":\"10.1097/SPV.0000000000001688\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Importance: </strong>As the volume of medical literature continues to expand, the provision of artificial intelligence (AI) to produce concise, accessible summaries has the potential to enhance the efficacy of content review.</p><p><strong>Objectives: </strong>This project assessed the readability and quality of summaries generated by ChatGPT in comparison to the Plain Text Summaries from Cochrane Review, a systematic review database, in incontinence research.</p><p><strong>Study design: </strong>Seventy-three abstracts from the Cochrane Library tagged under \\\"Incontinence\\\" were summarized using ChatGPT-3.5 (July 2023 Version) and compared with their corresponding Cochrane Plain Text Summaries. Readability was assessed using Flesch Kincaid Reading Ease, Flesch Kincaid Grade Level, Gunning Fog Score, Smog Index, Coleman Liau Index, and Automated Readability Index. A 2-tailed t test was used to compare the summaries. Each summary was also evaluated by 2 blinded, independent reviewers on a 5-point scale where higher scores indicated greater accuracy and adherence to the abstract.</p><p><strong>Results: </strong>Compared to ChatGPT, Cochrane Review's Plain Text Summaries scored higher in the numerical Flesch Kincaid Reading Ease score and showed lower necessary education levels in the 5 other readability metrics with statistical significance, indicating better readability. However, ChatGPT earned a higher mean accuracy grade of 4.25 compared to Cochrane Review's mean grade of 4.05 with statistical significance.</p><p><strong>Conclusions: </strong>Cochrane Review's Plain Text Summaries provide clearer summaries of the incontinence literature when compared to ChatGPT, yet ChatGPT generated more comprehensive summaries. While ChatGPT can effectively summarize the medical literature, further studies can improve reader accessibility to these summaries.</p>\",\"PeriodicalId\":75288,\"journal\":{\"name\":\"Urogynecology (Hagerstown, Md.)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2025-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Urogynecology (Hagerstown, Md.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1097/SPV.0000000000001688\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"OBSTETRICS & GYNECOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Urogynecology (Hagerstown, Md.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1097/SPV.0000000000001688","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"OBSTETRICS & GYNECOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
重要性:随着医学文献数量的不断扩大,提供人工智能(AI)来生成简洁、可访问的摘要有可能提高内容审查的有效性。目的:本项目评估了ChatGPT生成的摘要的可读性和质量,并将其与Cochrane综述(一个系统综述数据库)的纯文本摘要进行了比较。研究设计:使用ChatGPT-3.5(2023年7月版本)对Cochrane图书馆中标记为“Incontinence”的73篇摘要进行汇总,并与相应的Cochrane Plain Text summary进行比较。可读性采用Flesch Kincaid Reading Ease、Flesch Kincaid Grade Level、Gunning Fog Score、Smog Index、Coleman Liau Index和Automated可读性Index进行评估。采用双尾t检验比较总结。每个摘要也由2名独立的盲法审稿人以5分制进行评估,分数越高表明摘要的准确性和依从性越高。结果:与ChatGPT相比,Cochrane Review的Plain Text Summaries在数字Flesch Kincaid Reading Ease得分较高,而在其他5个可读性指标上的必要教育水平较低,且具有统计学意义,表明其可读性更高。但ChatGPT的平均准确率为4.25,高于Cochrane Review的平均准确率4.05,差异有统计学意义。结论:与ChatGPT相比,Cochrane Review的纯文本摘要提供了更清晰的失禁文献摘要,但ChatGPT生成了更全面的摘要。虽然ChatGPT可以有效地总结医学文献,但进一步的研究可以提高读者对这些总结的可访问性。
Evaluating Incontinence Abstracts: Artificial Intelligence-Generated Versus Cochrane Review.
Importance: As the volume of medical literature continues to expand, the provision of artificial intelligence (AI) to produce concise, accessible summaries has the potential to enhance the efficacy of content review.
Objectives: This project assessed the readability and quality of summaries generated by ChatGPT in comparison to the Plain Text Summaries from Cochrane Review, a systematic review database, in incontinence research.
Study design: Seventy-three abstracts from the Cochrane Library tagged under "Incontinence" were summarized using ChatGPT-3.5 (July 2023 Version) and compared with their corresponding Cochrane Plain Text Summaries. Readability was assessed using Flesch Kincaid Reading Ease, Flesch Kincaid Grade Level, Gunning Fog Score, Smog Index, Coleman Liau Index, and Automated Readability Index. A 2-tailed t test was used to compare the summaries. Each summary was also evaluated by 2 blinded, independent reviewers on a 5-point scale where higher scores indicated greater accuracy and adherence to the abstract.
Results: Compared to ChatGPT, Cochrane Review's Plain Text Summaries scored higher in the numerical Flesch Kincaid Reading Ease score and showed lower necessary education levels in the 5 other readability metrics with statistical significance, indicating better readability. However, ChatGPT earned a higher mean accuracy grade of 4.25 compared to Cochrane Review's mean grade of 4.05 with statistical significance.
Conclusions: Cochrane Review's Plain Text Summaries provide clearer summaries of the incontinence literature when compared to ChatGPT, yet ChatGPT generated more comprehensive summaries. While ChatGPT can effectively summarize the medical literature, further studies can improve reader accessibility to these summaries.