AI-driven abstract generating: evaluating LLMs with a tailored prompt under the PRISMA-A framework.

IF 3.1 2区医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE

BMC Oral Health Pub Date : 2025-10-10 DOI:10.1186/s12903-025-06982-4

Gizem Boztaş Demi̇r, Şule Gökmen, Yağızalp Süküt, Kübra Gülnur Topsakal, Serkan Görgülü

{"title":"AI-driven abstract generating: evaluating LLMs with a tailored prompt under the PRISMA-A framework.","authors":"Gizem Boztaş Demi̇r, Şule Gökmen, Yağızalp Süküt, Kübra Gülnur Topsakal, Serkan Görgülü","doi":"10.1186/s12903-025-06982-4","DOIUrl":null,"url":null,"abstract":"Background: This study aimed to assess and compare ChatGPT-4o and Gemini Pro's ability to generate structured abstracts from full-text systematic reviews and meta-analyses in orthodontics, based on adherence to the PRISMA Abstract (PRISMA-A) Checklist, using a customised prompt developed for this purpose.Materials and methods: A total of 162 full-text systematic reviews and meta-analyses published in Q1-ranked orthodontic journals since January 2019 were included. Each full-text article was processed by ChatGPT-4o and Gemini Pro, using a PRISMA-A Checklist-aligned structured prompt. Outputs were scored using a tailored Overall quality Score OQS derived from 11 PRISMA-A checklist. Inter-rater and time-dependent reliability were assessed with Intraclass Correlation Coefficients (ICCs), and model outputs were compared using Mann-Whitney U tests.Results: Both models yielded satisfactory OQS in generating PRISMA-A checklist compliant abstracts; however, ChatGPT-4o consistently achieved higher scores than Gemini Pro. The most notable differences were observed in the \"Included Studies\" and \"Synthesis of Results\" sections, where ChatGPT-4o produced more complete and structurally coherent outputs. ChatGPT-4o achieved a mean OQS of 21.67 (SD 0.58) versus 21.00 (SD 0.71) for Gemini Pro, a difference that was highly significant (p < 0.001).Conclusions: Both LLMs demonstrated the ability to generate PRISMA-A-compliant abstracts from systematic reviews, with ChatGPT-4o consistently achieving higher quality scores than Gemini Pro. While tested in orthodontics, the approach holds potential for broader applications across evidence-based dental and medical research. Systematic reviews and meta-analyses are essential to evidence-based dentistry but can be challenging and time-consuming to report in accordance with established standards. The structured prompt developed in this study may assist researchers in generating PRISMA-A-compliant outputs more efficiently, helping to accelerate the completion and standardisation of high-level clinical evidence reporting.","PeriodicalId":9072,"journal":{"name":"BMC Oral Health","volume":"25 1","pages":"1594"},"PeriodicalIF":3.1000,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12512598/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Oral Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12903-025-06982-4","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

Background: This study aimed to assess and compare ChatGPT-4o and Gemini Pro's ability to generate structured abstracts from full-text systematic reviews and meta-analyses in orthodontics, based on adherence to the PRISMA Abstract (PRISMA-A) Checklist, using a customised prompt developed for this purpose.

Materials and methods: A total of 162 full-text systematic reviews and meta-analyses published in Q1-ranked orthodontic journals since January 2019 were included. Each full-text article was processed by ChatGPT-4o and Gemini Pro, using a PRISMA-A Checklist-aligned structured prompt. Outputs were scored using a tailored Overall quality Score OQS derived from 11 PRISMA-A checklist. Inter-rater and time-dependent reliability were assessed with Intraclass Correlation Coefficients (ICCs), and model outputs were compared using Mann-Whitney U tests.

Results: Both models yielded satisfactory OQS in generating PRISMA-A checklist compliant abstracts; however, ChatGPT-4o consistently achieved higher scores than Gemini Pro. The most notable differences were observed in the "Included Studies" and "Synthesis of Results" sections, where ChatGPT-4o produced more complete and structurally coherent outputs. ChatGPT-4o achieved a mean OQS of 21.67 (SD 0.58) versus 21.00 (SD 0.71) for Gemini Pro, a difference that was highly significant (p < 0.001).

Conclusions: Both LLMs demonstrated the ability to generate PRISMA-A-compliant abstracts from systematic reviews, with ChatGPT-4o consistently achieving higher quality scores than Gemini Pro. While tested in orthodontics, the approach holds potential for broader applications across evidence-based dental and medical research. Systematic reviews and meta-analyses are essential to evidence-based dentistry but can be challenging and time-consuming to report in accordance with established standards. The structured prompt developed in this study may assist researchers in generating PRISMA-A-compliant outputs more efficiently, helping to accelerate the completion and standardisation of high-level clinical evidence reporting.

查看原文本刊更多论文

人工智能驱动的抽象生成：在PRISMA-A框架下使用定制提示评估法学硕士。

背景：本研究旨在评估和比较chatgpt - 40和Gemini Pro在正畸学的全文系统综述和荟萃分析中生成结构化摘要的能力，基于对PRISMA摘要（PRISMA- a）清单的遵守，使用为此目的开发的定制提示。材料和方法：纳入2019年1月以来发表在排名第一的正畸期刊上的162篇全文系统综述和荟萃分析。每篇全文文章由chatgpt - 40和Gemini Pro处理，使用PRISMA-A checklist对齐的结构化提示。使用从11个PRISMA-A清单中得出的量身定制的总体质量评分OQS对产出进行评分。用类内相关系数（ICCs）评估评分者间和时间依赖的信度，并使用Mann-Whitney U检验比较模型输出。结果：两种模型在生成符合PRISMA-A检查表的摘要时均获得了满意的OQS；然而，chatgpt - 40的得分始终高于Gemini Pro。最显著的差异出现在“纳入研究”和“结果综合”部分，其中chatgpt - 40产生了更完整和结构连贯的输出。chatgpt - 40的平均OQS为21.67 (SD 0.58)，而Gemini Pro的平均OQS为21.00 (SD 0.71)，这是一个非常显著的差异(p)。结论：两个LLMs都证明了从系统评价中生成符合prisma - a标准的摘要的能力，chatgpt - 40始终比Gemini Pro获得更高的质量分数。虽然在正畸学中进行了测试，但该方法在基于证据的牙科和医学研究中具有更广泛应用的潜力。系统评价和荟萃分析对循证牙科至关重要，但根据既定标准进行报告可能具有挑战性且耗时。本研究开发的结构化提示可以帮助研究人员更有效地生成符合prisma - a的输出，有助于加快高水平临床证据报告的完成和标准化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Oral Health DENTISTRY, ORAL SURGERY & MEDICINE-

CiteScore

3.90

自引率

6.90%

发文量

481

审稿时长

6-12 weeks

期刊介绍： BMC Oral Health is an open access, peer-reviewed journal that considers articles on all aspects of the prevention, diagnosis and management of disorders of the mouth, teeth and gums, as well as related molecular genetics, pathophysiology, and epidemiology.