Toward automating GRADE classification: a proof-of-concept evaluation of an artificial intelligence-based tool for semiautomated evidence quality rating in systematic reviews.
Alisson Oliveira Dos Santos, Vinícius Silva Belo, Tales Mota Machado, Eduardo Sérgio da Silva
{"title":"Toward automating GRADE classification: a proof-of-concept evaluation of an artificial intelligence-based tool for semiautomated evidence quality rating in systematic reviews.","authors":"Alisson Oliveira Dos Santos, Vinícius Silva Belo, Tales Mota Machado, Eduardo Sérgio da Silva","doi":"10.1136/bmjebm-2024-113123","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Evaluation of the quality of evidence in systematic reviews (SRs) is essential for assertive decision-making. Although Grading of Recommendations Assessment, Development and Evaluation (GRADE) affords a consolidated approach for rating the level of evidence, its application is complex and time-consuming. Artificial intelligence (AI) can be used to overcome these barriers.</p><p><strong>Design: </strong>Analytical experimental study.</p><p><strong>Objective: </strong>The objective is to develop and appraise a proof-of-concept AI-powered tool for the semiautomation of an adaptation of the GRADE classification system to determine levels of evidence in SRs with meta-analyses compiled from randomised clinical trials.</p><p><strong>Methods: </strong>The URSE-automated system was based on an algorithm created to enhance the objectivity of the GRADE classification. It was developed using the Python language and the React library to create user-friendly interfaces. Evaluation of the URSE-automated system was performed by analysing 115 SRs from the Cochrane Library and comparing the predicted levels of evidence with those generated by human evaluators.</p><p><strong>Results: </strong>The open-source URSE code is available on GitHub (http://www.github.com/alisson-mfc/urse). The agreement between the URSE-automated GRADE system and human evaluators regarding the quality of evidence was 63.2% with a Cohen's kappa coefficient of 0.44. The metrics of the GRADE domains evaluated included accuracy and F1-scores, which were 0.97 and 0.94 for imprecision (number of participants), 0.73 and 0.7 for risk of bias, 0.9 and 0.9 for I<sup>2</sup> values (heterogeneity) and 0.98 and 0.99 for quality of methodology (A Measurement Tool to Assess Systematic Reviews), respectively.</p><p><strong>Conclusion: </strong>The results demonstrate the potential use of AI in assessing the quality of evidence. However, in consideration of the emphasis of the GRADE approach on subjectivity and understanding the context of evidence production, full automation of the classification process is not opportune. Nevertheless, the combination of the URSE-automated system with human evaluation or the integration of this tool into other platforms represents interesting directions for the future.</p>","PeriodicalId":9059,"journal":{"name":"BMJ Evidence-Based Medicine","volume":" ","pages":""},"PeriodicalIF":9.0000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Evidence-Based Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/bmjebm-2024-113123","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Evaluation of the quality of evidence in systematic reviews (SRs) is essential for assertive decision-making. Although Grading of Recommendations Assessment, Development and Evaluation (GRADE) affords a consolidated approach for rating the level of evidence, its application is complex and time-consuming. Artificial intelligence (AI) can be used to overcome these barriers.
Design: Analytical experimental study.
Objective: The objective is to develop and appraise a proof-of-concept AI-powered tool for the semiautomation of an adaptation of the GRADE classification system to determine levels of evidence in SRs with meta-analyses compiled from randomised clinical trials.
Methods: The URSE-automated system was based on an algorithm created to enhance the objectivity of the GRADE classification. It was developed using the Python language and the React library to create user-friendly interfaces. Evaluation of the URSE-automated system was performed by analysing 115 SRs from the Cochrane Library and comparing the predicted levels of evidence with those generated by human evaluators.
Results: The open-source URSE code is available on GitHub (http://www.github.com/alisson-mfc/urse). The agreement between the URSE-automated GRADE system and human evaluators regarding the quality of evidence was 63.2% with a Cohen's kappa coefficient of 0.44. The metrics of the GRADE domains evaluated included accuracy and F1-scores, which were 0.97 and 0.94 for imprecision (number of participants), 0.73 and 0.7 for risk of bias, 0.9 and 0.9 for I2 values (heterogeneity) and 0.98 and 0.99 for quality of methodology (A Measurement Tool to Assess Systematic Reviews), respectively.
Conclusion: The results demonstrate the potential use of AI in assessing the quality of evidence. However, in consideration of the emphasis of the GRADE approach on subjectivity and understanding the context of evidence production, full automation of the classification process is not opportune. Nevertheless, the combination of the URSE-automated system with human evaluation or the integration of this tool into other platforms represents interesting directions for the future.
期刊介绍:
BMJ Evidence-Based Medicine (BMJ EBM) publishes original evidence-based research, insights and opinions on what matters for health care. We focus on the tools, methods, and concepts that are basic and central to practising evidence-based medicine and deliver relevant, trustworthy and impactful evidence.
BMJ EBM is a Plan S compliant Transformative Journal and adheres to the highest possible industry standards for editorial policies and publication ethics.