从手工到机器:革命性的日间手术指南和共识质量评估与大语言模型

IF 3.6 2区 医学 Q1 MEDICINE, GENERAL & INTERNAL
Xingyu Wan, Ruiyan Wang, Junxian Zhao, Tianhu Liang, Bingyi Wang, Jie Zhang, Yujia Liu, Yan Ma, Yaolong Chen, Xinghua Lv
{"title":"从手工到机器:革命性的日间手术指南和共识质量评估与大语言模型","authors":"Xingyu Wan,&nbsp;Ruiyan Wang,&nbsp;Junxian Zhao,&nbsp;Tianhu Liang,&nbsp;Bingyi Wang,&nbsp;Jie Zhang,&nbsp;Yujia Liu,&nbsp;Yan Ma,&nbsp;Yaolong Chen,&nbsp;Xinghua Lv","doi":"10.1111/jebm.70017","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Objective</h3>\n \n <p>To evaluate the methodological and reporting quality of clinical practice guidelines/expert consensus for ambulatory surgery centers published since 2000, combining manual assessment with large language model (LLM) analysis, while exploring LLMs' feasibility in quality evaluation.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>We systematically searched Chinese/English databases and guideline repositories. Two researchers independently screened literature and extracted data. Quality assessments were conducted using AGREE II and RIGHT tools through both manual evaluation and GPT-4o modeling.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>54 eligible documents were included. AGREE II domains showed mean compliance: Scope and purpose 25.00%, Stakeholder involvement 20.16%, Rigor of development 17.28%, Clarity of presentation 41.56%, Applicability 18.06%, Editorial independence 26.39%. RIGHT items averaged: Basic information 44.44%, Background 36.11%, Evidence 14.07%, Recommendations 34.66%, Review and quality assurance 3.70%, Funding and declaration and management of interests 24.54%, Other information 27.16%. LLMs'-evaluated documents demonstrated significantly higher scores than manual assessments in both tools. Subgroup analyses revealed superior quality in documents with evidence retrieval, conflict disclosure, funding support, and LLM integration (<i>P</i> &lt;0.05).</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>Current guidelines and consensus related to day surgery need to improve their methodological quality and quality of reporting. The study validates LLMs' supplementary value in quality assessment while emphasizing the necessity of maintaining manual evaluation as the foundation.</p>\n </section>\n </div>","PeriodicalId":16090,"journal":{"name":"Journal of Evidence‐Based Medicine","volume":"18 1","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jebm.70017","citationCount":"0","resultStr":"{\"title\":\"From Manual to Machine: Revolutionizing Day Surgery Guideline and Consensus Quality Assessment With Large Language Models\",\"authors\":\"Xingyu Wan,&nbsp;Ruiyan Wang,&nbsp;Junxian Zhao,&nbsp;Tianhu Liang,&nbsp;Bingyi Wang,&nbsp;Jie Zhang,&nbsp;Yujia Liu,&nbsp;Yan Ma,&nbsp;Yaolong Chen,&nbsp;Xinghua Lv\",\"doi\":\"10.1111/jebm.70017\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Objective</h3>\\n \\n <p>To evaluate the methodological and reporting quality of clinical practice guidelines/expert consensus for ambulatory surgery centers published since 2000, combining manual assessment with large language model (LLM) analysis, while exploring LLMs' feasibility in quality evaluation.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>We systematically searched Chinese/English databases and guideline repositories. Two researchers independently screened literature and extracted data. Quality assessments were conducted using AGREE II and RIGHT tools through both manual evaluation and GPT-4o modeling.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>54 eligible documents were included. AGREE II domains showed mean compliance: Scope and purpose 25.00%, Stakeholder involvement 20.16%, Rigor of development 17.28%, Clarity of presentation 41.56%, Applicability 18.06%, Editorial independence 26.39%. RIGHT items averaged: Basic information 44.44%, Background 36.11%, Evidence 14.07%, Recommendations 34.66%, Review and quality assurance 3.70%, Funding and declaration and management of interests 24.54%, Other information 27.16%. LLMs'-evaluated documents demonstrated significantly higher scores than manual assessments in both tools. Subgroup analyses revealed superior quality in documents with evidence retrieval, conflict disclosure, funding support, and LLM integration (<i>P</i> &lt;0.05).</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusion</h3>\\n \\n <p>Current guidelines and consensus related to day surgery need to improve their methodological quality and quality of reporting. The study validates LLMs' supplementary value in quality assessment while emphasizing the necessity of maintaining manual evaluation as the foundation.</p>\\n </section>\\n </div>\",\"PeriodicalId\":16090,\"journal\":{\"name\":\"Journal of Evidence‐Based Medicine\",\"volume\":\"18 1\",\"pages\":\"\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2025-03-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jebm.70017\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Evidence‐Based Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/jebm.70017\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Evidence‐Based Medicine","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jebm.70017","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

摘要

目的将人工评价与大语言模型(LLM)分析相结合,对2000年以来出版的门诊外科中心临床实践指南/专家共识的方法学和报告质量进行评价,探讨LLM在质量评价中的可行性。方法系统检索中英文数据库和指南库。两位研究者独立筛选文献并提取数据。通过手工评估和gpt - 40建模,使用AGREE II和RIGHT工具进行质量评估。结果共纳入54份符合标准的文献。AGREE II领域的平均合规性为:范围和目的25.00%,利益相关者参与20.16%,开发严谨性17.28%,表述清晰性41.56%,适用性18.06%,编辑独立性26.39%。RIGHT平均条目数:基本信息44.44%、背景36.11%、证据14.07%、建议34.66%、审查和质量保证3.70%、资助和利益申报与管理24.54%、其他信息27.16%。法学硕士评估的文档在这两种工具中都比人工评估的分数高得多。亚组分析显示,证据检索、冲突披露、资金支持和LLM整合的文献质量更高(P <0.05)。结论目前关于日间手术的指南和共识需要提高其方法学质量和报告质量。本研究验证了法学硕士在质量评价中的补充价值,同时强调了保持人工评价为基础的必要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

From Manual to Machine: Revolutionizing Day Surgery Guideline and Consensus Quality Assessment With Large Language Models

From Manual to Machine: Revolutionizing Day Surgery Guideline and Consensus Quality Assessment With Large Language Models

Objective

To evaluate the methodological and reporting quality of clinical practice guidelines/expert consensus for ambulatory surgery centers published since 2000, combining manual assessment with large language model (LLM) analysis, while exploring LLMs' feasibility in quality evaluation.

Methods

We systematically searched Chinese/English databases and guideline repositories. Two researchers independently screened literature and extracted data. Quality assessments were conducted using AGREE II and RIGHT tools through both manual evaluation and GPT-4o modeling.

Results

54 eligible documents were included. AGREE II domains showed mean compliance: Scope and purpose 25.00%, Stakeholder involvement 20.16%, Rigor of development 17.28%, Clarity of presentation 41.56%, Applicability 18.06%, Editorial independence 26.39%. RIGHT items averaged: Basic information 44.44%, Background 36.11%, Evidence 14.07%, Recommendations 34.66%, Review and quality assurance 3.70%, Funding and declaration and management of interests 24.54%, Other information 27.16%. LLMs'-evaluated documents demonstrated significantly higher scores than manual assessments in both tools. Subgroup analyses revealed superior quality in documents with evidence retrieval, conflict disclosure, funding support, and LLM integration (P <0.05).

Conclusion

Current guidelines and consensus related to day surgery need to improve their methodological quality and quality of reporting. The study validates LLMs' supplementary value in quality assessment while emphasizing the necessity of maintaining manual evaluation as the foundation.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Evidence‐Based Medicine
Journal of Evidence‐Based Medicine MEDICINE, GENERAL & INTERNAL-
CiteScore
11.20
自引率
1.40%
发文量
42
期刊介绍: The Journal of Evidence-Based Medicine (EMB) is an esteemed international healthcare and medical decision-making journal, dedicated to publishing groundbreaking research outcomes in evidence-based decision-making, research, practice, and education. Serving as the official English-language journal of the Cochrane China Centre and West China Hospital of Sichuan University, we eagerly welcome editorials, commentaries, and systematic reviews encompassing various topics such as clinical trials, policy, drug and patient safety, education, and knowledge translation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信