评估快速配方对ChatGPT在种植体中可靠性和可重复性的影响。

IF 2.6 3区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

PLoS ONE Pub Date : 2025-05-30 eCollection Date: 2025-01-01 DOI:10.1371/journal.pone.0323086

Yolanda Freire, Andrea Santamaría Laorden, Jaime Orejas Pérez, Ignacio Ortiz Collado, Margarita Gómez Sánchez, Israel J Thuissard Vasallo, Víctor Díaz-Flores García, Ana Suárez

{"title":"评估快速配方对ChatGPT在种植体中可靠性和可重复性的影响。","authors":"Yolanda Freire, Andrea Santamaría Laorden, Jaime Orejas Pérez, Ignacio Ortiz Collado, Margarita Gómez Sánchez, Israel J Thuissard Vasallo, Víctor Díaz-Flores García, Ana Suárez","doi":"10.1371/journal.pone.0323086","DOIUrl":null,"url":null,"abstract":"Language models (LLMs) such as ChatGPT are widely available to any dental professional. However, there is limited evidence to evaluate the reliability and reproducibility of ChatGPT-4 in relation to implant-supported prostheses, as well as the impact of prompt design on its responses. This constrains understanding of its application within this specific area of dentistry. The purpose of this study was to evaluate the performance of ChatGPT-4 in generating answers about implant-supported prostheses using different prompts. Thirty questions on implant-supported and implant-retained prostheses were posed, with 30 answers generated per question using general and specific prompts, totaling 1800 answers. Experts assessed reliability (agreement with expert grading) and repeatability (response consistency) using a 3-point Likert scale. General prompts achieved 70.89% reliability, with repeatability ranging from moderate to almost perfect. Specific prompts showed higher performance, with 78.8% reliability and substantial to almost perfect repeatability. The specific prompt significantly improved reliability compared to the general prompt. Despite these promising results, ChatGPT's ability to generate reliable answers on implant-supported prostheses remains limited, highlighting the need for professional oversight. Using specific prompts can enhance its performance. The use of a specific prompt might improve the answer generation performance of ChatGPT.","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 5","pages":"e0323086"},"PeriodicalIF":2.6000,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12124515/pdf/","citationCount":"0","resultStr":"{\"title\":\"Evaluating the influence of prompt formulation on the reliability and repeatability of ChatGPT in implant-supported prostheses.\",\"authors\":\"Yolanda Freire, Andrea Santamaría Laorden, Jaime Orejas Pérez, Ignacio Ortiz Collado, Margarita Gómez Sánchez, Israel J Thuissard Vasallo, Víctor Díaz-Flores García, Ana Suárez\",\"doi\":\"10.1371/journal.pone.0323086\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Language models (LLMs) such as ChatGPT are widely available to any dental professional. However, there is limited evidence to evaluate the reliability and reproducibility of ChatGPT-4 in relation to implant-supported prostheses, as well as the impact of prompt design on its responses. This constrains understanding of its application within this specific area of dentistry. The purpose of this study was to evaluate the performance of ChatGPT-4 in generating answers about implant-supported prostheses using different prompts. Thirty questions on implant-supported and implant-retained prostheses were posed, with 30 answers generated per question using general and specific prompts, totaling 1800 answers. Experts assessed reliability (agreement with expert grading) and repeatability (response consistency) using a 3-point Likert scale. General prompts achieved 70.89% reliability, with repeatability ranging from moderate to almost perfect. Specific prompts showed higher performance, with 78.8% reliability and substantial to almost perfect repeatability. The specific prompt significantly improved reliability compared to the general prompt. Despite these promising results, ChatGPT's ability to generate reliable answers on implant-supported prostheses remains limited, highlighting the need for professional oversight. Using specific prompts can enhance its performance. The use of a specific prompt might improve the answer generation performance of ChatGPT.\",\"PeriodicalId\":20189,\"journal\":{\"name\":\"PLoS ONE\",\"volume\":\"20 5\",\"pages\":\"e0323086\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12124515/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS ONE\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pone.0323086\",\"RegionNum\":3,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0323086","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

像ChatGPT这样的语言模型（llm）广泛适用于任何牙科专业人员。然而，评估ChatGPT-4与种植体支持的假体相关的可靠性和可重复性以及提示设计对其反应的影响的证据有限。这限制了对其在牙科这一特定领域的应用的理解。本研究的目的是评估ChatGPT-4在使用不同提示生成关于种植体支持假体的答案方面的性能。共提出30个关于种植体支持和种植体保留假体的问题，每个问题使用一般和特定提示生成30个答案，共计1800个答案。专家使用3点李克特量表评估可靠性（与专家评分一致）和可重复性（反应一致性）。一般提示的可靠性达到70.89%，可重复性从中等到近乎完美。特定提示显示出更高的性能，具有78.8%的可靠性和几乎完美的可重复性。与一般提示相比，特定提示显着提高了可靠性。尽管有这些有希望的结果，ChatGPT在种植体支持的假体上产生可靠答案的能力仍然有限，这突出了专业监督的必要性。使用特定的提示可以增强其性能。使用特定的提示可以提高ChatGPT的答案生成性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Evaluating the influence of prompt formulation on the reliability and repeatability of ChatGPT in implant-supported prostheses.

查看原文本刊更多论文

Evaluating the influence of prompt formulation on the reliability and repeatability of ChatGPT in implant-supported prostheses.

Language models (LLMs) such as ChatGPT are widely available to any dental professional. However, there is limited evidence to evaluate the reliability and reproducibility of ChatGPT-4 in relation to implant-supported prostheses, as well as the impact of prompt design on its responses. This constrains understanding of its application within this specific area of dentistry. The purpose of this study was to evaluate the performance of ChatGPT-4 in generating answers about implant-supported prostheses using different prompts. Thirty questions on implant-supported and implant-retained prostheses were posed, with 30 answers generated per question using general and specific prompts, totaling 1800 answers. Experts assessed reliability (agreement with expert grading) and repeatability (response consistency) using a 3-point Likert scale. General prompts achieved 70.89% reliability, with repeatability ranging from moderate to almost perfect. Specific prompts showed higher performance, with 78.8% reliability and substantial to almost perfect repeatability. The specific prompt significantly improved reliability compared to the general prompt. Despite these promising results, ChatGPT's ability to generate reliable answers on implant-supported prostheses remains limited, highlighting the need for professional oversight. Using specific prompts can enhance its performance. The use of a specific prompt might improve the answer generation performance of ChatGPT.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

PLoS ONE 生物-生物学

CiteScore

6.20

自引率

5.40%

发文量

14242

审稿时长

3.7 months

期刊介绍： PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides: * Open-access—freely accessible online, authors retain copyright * Fast publication times * Peer review by expert, practicing researchers * Post-publication tools to indicate quality and impact * Community-based dialogue on articles * Worldwide media coverage