{"title":"AI-driven abstract generating: evaluating LLMs with a tailored prompt under the PRISMA-A framework.","authors":"Gizem Boztaş Demi̇r, Şule Gökmen, Yağızalp Süküt, Kübra Gülnur Topsakal, Serkan Görgülü","doi":"10.1186/s12903-025-06982-4","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>This study aimed to assess and compare ChatGPT-4o and Gemini Pro's ability to generate structured abstracts from full-text systematic reviews and meta-analyses in orthodontics, based on adherence to the PRISMA Abstract (PRISMA-A) Checklist, using a customised prompt developed for this purpose.</p><p><strong>Materials and methods: </strong>A total of 162 full-text systematic reviews and meta-analyses published in Q1-ranked orthodontic journals since January 2019 were included. Each full-text article was processed by ChatGPT-4o and Gemini Pro, using a PRISMA-A Checklist-aligned structured prompt. Outputs were scored using a tailored Overall quality Score OQS derived from 11 PRISMA-A checklist. Inter-rater and time-dependent reliability were assessed with Intraclass Correlation Coefficients (ICCs), and model outputs were compared using Mann-Whitney U tests.</p><p><strong>Results: </strong>Both models yielded satisfactory OQS in generating PRISMA-A checklist compliant abstracts; however, ChatGPT-4o consistently achieved higher scores than Gemini Pro. The most notable differences were observed in the \"Included Studies\" and \"Synthesis of Results\" sections, where ChatGPT-4o produced more complete and structurally coherent outputs. ChatGPT-4o achieved a mean OQS of 21.67 (SD 0.58) versus 21.00 (SD 0.71) for Gemini Pro, a difference that was highly significant (p < 0.001).</p><p><strong>Conclusions: </strong>Both LLMs demonstrated the ability to generate PRISMA-A-compliant abstracts from systematic reviews, with ChatGPT-4o consistently achieving higher quality scores than Gemini Pro. While tested in orthodontics, the approach holds potential for broader applications across evidence-based dental and medical research. Systematic reviews and meta-analyses are essential to evidence-based dentistry but can be challenging and time-consuming to report in accordance with established standards. The structured prompt developed in this study may assist researchers in generating PRISMA-A-compliant outputs more efficiently, helping to accelerate the completion and standardisation of high-level clinical evidence reporting.</p>","PeriodicalId":9072,"journal":{"name":"BMC Oral Health","volume":"25 1","pages":"1594"},"PeriodicalIF":3.1000,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12512598/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Oral Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12903-025-06982-4","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0
Abstract
Background: This study aimed to assess and compare ChatGPT-4o and Gemini Pro's ability to generate structured abstracts from full-text systematic reviews and meta-analyses in orthodontics, based on adherence to the PRISMA Abstract (PRISMA-A) Checklist, using a customised prompt developed for this purpose.
Materials and methods: A total of 162 full-text systematic reviews and meta-analyses published in Q1-ranked orthodontic journals since January 2019 were included. Each full-text article was processed by ChatGPT-4o and Gemini Pro, using a PRISMA-A Checklist-aligned structured prompt. Outputs were scored using a tailored Overall quality Score OQS derived from 11 PRISMA-A checklist. Inter-rater and time-dependent reliability were assessed with Intraclass Correlation Coefficients (ICCs), and model outputs were compared using Mann-Whitney U tests.
Results: Both models yielded satisfactory OQS in generating PRISMA-A checklist compliant abstracts; however, ChatGPT-4o consistently achieved higher scores than Gemini Pro. The most notable differences were observed in the "Included Studies" and "Synthesis of Results" sections, where ChatGPT-4o produced more complete and structurally coherent outputs. ChatGPT-4o achieved a mean OQS of 21.67 (SD 0.58) versus 21.00 (SD 0.71) for Gemini Pro, a difference that was highly significant (p < 0.001).
Conclusions: Both LLMs demonstrated the ability to generate PRISMA-A-compliant abstracts from systematic reviews, with ChatGPT-4o consistently achieving higher quality scores than Gemini Pro. While tested in orthodontics, the approach holds potential for broader applications across evidence-based dental and medical research. Systematic reviews and meta-analyses are essential to evidence-based dentistry but can be challenging and time-consuming to report in accordance with established standards. The structured prompt developed in this study may assist researchers in generating PRISMA-A-compliant outputs more efficiently, helping to accelerate the completion and standardisation of high-level clinical evidence reporting.
期刊介绍:
BMC Oral Health is an open access, peer-reviewed journal that considers articles on all aspects of the prevention, diagnosis and management of disorders of the mouth, teeth and gums, as well as related molecular genetics, pathophysiology, and epidemiology.