Xingyu Wan, Ruiyan Wang, Junxian Zhao, Tianhu Liang, Bingyi Wang, Jie Zhang, Yujia Liu, Yan Ma, Yaolong Chen, Xinghua Lv
{"title":"从手工到机器:革命性的日间手术指南和共识质量评估与大语言模型","authors":"Xingyu Wan, Ruiyan Wang, Junxian Zhao, Tianhu Liang, Bingyi Wang, Jie Zhang, Yujia Liu, Yan Ma, Yaolong Chen, Xinghua Lv","doi":"10.1111/jebm.70017","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Objective</h3>\n \n <p>To evaluate the methodological and reporting quality of clinical practice guidelines/expert consensus for ambulatory surgery centers published since 2000, combining manual assessment with large language model (LLM) analysis, while exploring LLMs' feasibility in quality evaluation.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>We systematically searched Chinese/English databases and guideline repositories. Two researchers independently screened literature and extracted data. Quality assessments were conducted using AGREE II and RIGHT tools through both manual evaluation and GPT-4o modeling.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>54 eligible documents were included. AGREE II domains showed mean compliance: Scope and purpose 25.00%, Stakeholder involvement 20.16%, Rigor of development 17.28%, Clarity of presentation 41.56%, Applicability 18.06%, Editorial independence 26.39%. RIGHT items averaged: Basic information 44.44%, Background 36.11%, Evidence 14.07%, Recommendations 34.66%, Review and quality assurance 3.70%, Funding and declaration and management of interests 24.54%, Other information 27.16%. LLMs'-evaluated documents demonstrated significantly higher scores than manual assessments in both tools. Subgroup analyses revealed superior quality in documents with evidence retrieval, conflict disclosure, funding support, and LLM integration (<i>P</i> <0.05).</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>Current guidelines and consensus related to day surgery need to improve their methodological quality and quality of reporting. The study validates LLMs' supplementary value in quality assessment while emphasizing the necessity of maintaining manual evaluation as the foundation.</p>\n </section>\n </div>","PeriodicalId":16090,"journal":{"name":"Journal of Evidence‐Based Medicine","volume":"18 1","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jebm.70017","citationCount":"0","resultStr":"{\"title\":\"From Manual to Machine: Revolutionizing Day Surgery Guideline and Consensus Quality Assessment With Large Language Models\",\"authors\":\"Xingyu Wan, Ruiyan Wang, Junxian Zhao, Tianhu Liang, Bingyi Wang, Jie Zhang, Yujia Liu, Yan Ma, Yaolong Chen, Xinghua Lv\",\"doi\":\"10.1111/jebm.70017\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Objective</h3>\\n \\n <p>To evaluate the methodological and reporting quality of clinical practice guidelines/expert consensus for ambulatory surgery centers published since 2000, combining manual assessment with large language model (LLM) analysis, while exploring LLMs' feasibility in quality evaluation.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>We systematically searched Chinese/English databases and guideline repositories. Two researchers independently screened literature and extracted data. Quality assessments were conducted using AGREE II and RIGHT tools through both manual evaluation and GPT-4o modeling.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>54 eligible documents were included. AGREE II domains showed mean compliance: Scope and purpose 25.00%, Stakeholder involvement 20.16%, Rigor of development 17.28%, Clarity of presentation 41.56%, Applicability 18.06%, Editorial independence 26.39%. RIGHT items averaged: Basic information 44.44%, Background 36.11%, Evidence 14.07%, Recommendations 34.66%, Review and quality assurance 3.70%, Funding and declaration and management of interests 24.54%, Other information 27.16%. LLMs'-evaluated documents demonstrated significantly higher scores than manual assessments in both tools. Subgroup analyses revealed superior quality in documents with evidence retrieval, conflict disclosure, funding support, and LLM integration (<i>P</i> <0.05).</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusion</h3>\\n \\n <p>Current guidelines and consensus related to day surgery need to improve their methodological quality and quality of reporting. The study validates LLMs' supplementary value in quality assessment while emphasizing the necessity of maintaining manual evaluation as the foundation.</p>\\n </section>\\n </div>\",\"PeriodicalId\":16090,\"journal\":{\"name\":\"Journal of Evidence‐Based Medicine\",\"volume\":\"18 1\",\"pages\":\"\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2025-03-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jebm.70017\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Evidence‐Based Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/jebm.70017\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Evidence‐Based Medicine","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jebm.70017","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
From Manual to Machine: Revolutionizing Day Surgery Guideline and Consensus Quality Assessment With Large Language Models
Objective
To evaluate the methodological and reporting quality of clinical practice guidelines/expert consensus for ambulatory surgery centers published since 2000, combining manual assessment with large language model (LLM) analysis, while exploring LLMs' feasibility in quality evaluation.
Methods
We systematically searched Chinese/English databases and guideline repositories. Two researchers independently screened literature and extracted data. Quality assessments were conducted using AGREE II and RIGHT tools through both manual evaluation and GPT-4o modeling.
Results
54 eligible documents were included. AGREE II domains showed mean compliance: Scope and purpose 25.00%, Stakeholder involvement 20.16%, Rigor of development 17.28%, Clarity of presentation 41.56%, Applicability 18.06%, Editorial independence 26.39%. RIGHT items averaged: Basic information 44.44%, Background 36.11%, Evidence 14.07%, Recommendations 34.66%, Review and quality assurance 3.70%, Funding and declaration and management of interests 24.54%, Other information 27.16%. LLMs'-evaluated documents demonstrated significantly higher scores than manual assessments in both tools. Subgroup analyses revealed superior quality in documents with evidence retrieval, conflict disclosure, funding support, and LLM integration (P <0.05).
Conclusion
Current guidelines and consensus related to day surgery need to improve their methodological quality and quality of reporting. The study validates LLMs' supplementary value in quality assessment while emphasizing the necessity of maintaining manual evaluation as the foundation.
期刊介绍:
The Journal of Evidence-Based Medicine (EMB) is an esteemed international healthcare and medical decision-making journal, dedicated to publishing groundbreaking research outcomes in evidence-based decision-making, research, practice, and education. Serving as the official English-language journal of the Cochrane China Centre and West China Hospital of Sichuan University, we eagerly welcome editorials, commentaries, and systematic reviews encompassing various topics such as clinical trials, policy, drug and patient safety, education, and knowledge translation.