加强口腔正畸方面的系统综述：GPT-3.5 和 GPT-4 在生成基于 PICO 的定制提示和配置查询方面的比较研究。

IF 2.8 3区医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE

European journal of orthodontics Pub Date : 2024-04-01 DOI:10.1093/ejo/cjae011

Gizem Boztaş Demir, Yağızalp Süküt, Gökhan Serhat Duran, Kübra Gülnur Topsakal, Serkan Görgülü

{"title":"加强口腔正畸方面的系统综述：GPT-3.5 和 GPT-4 在生成基于 PICO 的定制提示和配置查询方面的比较研究。","authors":"Gizem Boztaş Demir, Yağızalp Süküt, Gökhan Serhat Duran, Kübra Gülnur Topsakal, Serkan Görgülü","doi":"10.1093/ejo/cjae011","DOIUrl":null,"url":null,"abstract":"Objectives: The rapid advancement of Large Language Models (LLMs) has prompted an exploration of their efficacy in generating PICO-based (Patient, Intervention, Comparison, Outcome) queries, especially in the field of orthodontics. This study aimed to assess the usability of Large Language Models (LLMs), in aiding systematic review processes, with a specific focus on comparing the performance of ChatGPT 3.5 and ChatGPT 4 using a specialized prompt tailored for orthodontics.Materials/methods: Five databases were perused to curate a sample of 77 systematic reviews and meta-analyses published between 2016 and 2021. Utilizing prompt engineering techniques, the LLMs were directed to formulate PICO questions, Boolean queries, and relevant keywords. The outputs were subsequently evaluated for accuracy and consistency by independent researchers using three-point and six-point Likert scales. Furthermore, the PICO records of 41 studies, which were compatible with the PROSPERO records, were compared with the responses provided by the models.Results: ChatGPT 3.5 and 4 showcased a consistent ability to craft PICO-based queries. Statistically significant differences in accuracy were observed in specific categories, with GPT-4 often outperforming GPT-3.5.Limitations: The study's test set might not encapsulate the full range of LLM application scenarios. Emphasis on specific question types may also not reflect the complete capabilities of the models.Conclusions/implications: Both ChatGPT 3.5 and 4 can be pivotal tools for generating PICO-driven queries in orthodontics when optimally configured. However, the precision required in medical research necessitates a judicious and critical evaluation of LLM-generated outputs, advocating for a circumspect integration into scientific investigations.","PeriodicalId":11989,"journal":{"name":"European journal of orthodontics","volume":"46 2","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing systematic reviews in orthodontics: a comparative examination of GPT-3.5 and GPT-4 for generating PICO-based queries with tailored prompts and configurations.\",\"authors\":\"Gizem Boztaş Demir, Yağızalp Süküt, Gökhan Serhat Duran, Kübra Gülnur Topsakal, Serkan Görgülü\",\"doi\":\"10.1093/ejo/cjae011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objectives: The rapid advancement of Large Language Models (LLMs) has prompted an exploration of their efficacy in generating PICO-based (Patient, Intervention, Comparison, Outcome) queries, especially in the field of orthodontics. This study aimed to assess the usability of Large Language Models (LLMs), in aiding systematic review processes, with a specific focus on comparing the performance of ChatGPT 3.5 and ChatGPT 4 using a specialized prompt tailored for orthodontics.Materials/methods: Five databases were perused to curate a sample of 77 systematic reviews and meta-analyses published between 2016 and 2021. Utilizing prompt engineering techniques, the LLMs were directed to formulate PICO questions, Boolean queries, and relevant keywords. The outputs were subsequently evaluated for accuracy and consistency by independent researchers using three-point and six-point Likert scales. Furthermore, the PICO records of 41 studies, which were compatible with the PROSPERO records, were compared with the responses provided by the models.Results: ChatGPT 3.5 and 4 showcased a consistent ability to craft PICO-based queries. Statistically significant differences in accuracy were observed in specific categories, with GPT-4 often outperforming GPT-3.5.Limitations: The study's test set might not encapsulate the full range of LLM application scenarios. Emphasis on specific question types may also not reflect the complete capabilities of the models.Conclusions/implications: Both ChatGPT 3.5 and 4 can be pivotal tools for generating PICO-driven queries in orthodontics when optimally configured. However, the precision required in medical research necessitates a judicious and critical evaluation of LLM-generated outputs, advocating for a circumspect integration into scientific investigations.\",\"PeriodicalId\":11989,\"journal\":{\"name\":\"European journal of orthodontics\",\"volume\":\"46 2\",\"pages\":\"\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European journal of orthodontics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1093/ejo/cjae011\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European journal of orthodontics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/ejo/cjae011","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

摘要

目的：大语言模型（LLMs）的快速发展促使人们探索其在生成基于 PICO（患者、干预、比较、结果）的查询方面的功效，尤其是在口腔正畸领域。本研究旨在评估大型语言模型（LLMs）在辅助系统性综述过程中的可用性，重点是使用专为口腔正畸定制的提示比较 ChatGPT 3.5 和 ChatGPT 4 的性能：我们浏览了五个数据库，整理出 2016 年至 2021 年间发表的 77 篇系统综述和荟萃分析样本。利用提示工程技术，指导 LLMs 提出 PICO 问题、布尔查询和相关关键词。随后，独立研究人员使用三点和六点李克特量表对结果的准确性和一致性进行了评估。此外，41 项研究的 PICO 记录（与 PROSPERO 记录一致）与模型提供的回答进行了比较：结果：ChatGPT 3.5 和 4 展示了制作基于 PICO 的查询的一致能力。在特定类别中，准确率存在明显的统计学差异，GPT-4 通常优于 GPT-3.5：局限性：本研究的测试集可能无法囊括所有的 LLM 应用场景。结论/影响：ChatGPT 3.5 和 ChatGPT 4 经过优化配置后，可以成为在正畸学中生成 PICO 驱动查询的重要工具。然而，由于医学研究需要精确性，因此有必要对 LLM 生成的结果进行审慎和批判性的评估，提倡谨慎地将其整合到科学研究中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhancing systematic reviews in orthodontics: a comparative examination of GPT-3.5 and GPT-4 for generating PICO-based queries with tailored prompts and configurations.

Objectives: The rapid advancement of Large Language Models (LLMs) has prompted an exploration of their efficacy in generating PICO-based (Patient, Intervention, Comparison, Outcome) queries, especially in the field of orthodontics. This study aimed to assess the usability of Large Language Models (LLMs), in aiding systematic review processes, with a specific focus on comparing the performance of ChatGPT 3.5 and ChatGPT 4 using a specialized prompt tailored for orthodontics.

Materials/methods: Five databases were perused to curate a sample of 77 systematic reviews and meta-analyses published between 2016 and 2021. Utilizing prompt engineering techniques, the LLMs were directed to formulate PICO questions, Boolean queries, and relevant keywords. The outputs were subsequently evaluated for accuracy and consistency by independent researchers using three-point and six-point Likert scales. Furthermore, the PICO records of 41 studies, which were compatible with the PROSPERO records, were compared with the responses provided by the models.

Results: ChatGPT 3.5 and 4 showcased a consistent ability to craft PICO-based queries. Statistically significant differences in accuracy were observed in specific categories, with GPT-4 often outperforming GPT-3.5.

Limitations: The study's test set might not encapsulate the full range of LLM application scenarios. Emphasis on specific question types may also not reflect the complete capabilities of the models.

Conclusions/implications: Both ChatGPT 3.5 and 4 can be pivotal tools for generating PICO-driven queries in orthodontics when optimally configured. However, the precision required in medical research necessitates a judicious and critical evaluation of LLM-generated outputs, advocating for a circumspect integration into scientific investigations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

European journal of orthodontics 医学-牙科与口腔外科

CiteScore

5.50

自引率

7.70%

发文量

审稿时长

4-8 weeks

期刊介绍： The European Journal of Orthodontics publishes papers of excellence on all aspects of orthodontics including craniofacial development and growth. The emphasis of the journal is on full research papers. Succinct and carefully prepared papers are favoured in terms of impact as well as readability.