Lack of methodological rigor and limited coverage of generative artificial intelligence in existing artificial intelligence reporting guidelines: a scoping review
Xufei Luo , Bingyi Wang , Qianling Shi , Zijun Wang , Honghao Lai , Hui Liu , Yishan Qin , Fengxian Chen , Xuping Song , Long Ge , Lu Zhang , Zhaoxiang Bian , Yaolong Chen
{"title":"Lack of methodological rigor and limited coverage of generative artificial intelligence in existing artificial intelligence reporting guidelines: a scoping review","authors":"Xufei Luo , Bingyi Wang , Qianling Shi , Zijun Wang , Honghao Lai , Hui Liu , Yishan Qin , Fengxian Chen , Xuping Song , Long Ge , Lu Zhang , Zhaoxiang Bian , Yaolong Chen","doi":"10.1016/j.jclinepi.2025.111903","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><div>This study aimed to systematically map the development methods, scope, and limitations of existing artificial intelligence (AI) reporting guidelines in medicine and to explore their applicability to generative AI (GAI) tools, such as large language models (LLMs).</div></div><div><h3>Study Design and Setting</h3><div>We reported a scoping review adhering to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews. Five information sources were searched, including MEDLINE (via PubMed), Enhancing the QUAlity and Transparency Of health Research (EQUATOR) Network, China National Knowledge Infrastructure, FAIRsharing, and Google Scholar, from inception to December 31, 2024. Two reviewers independently screened records and extracted data using a predefined Excel template. Data included guideline characteristics (eg, development methods, target audience, AI domain), adherence to EQUATOR Network recommendations, and consensus methodologies. Discrepancies were resolved by a third reviewer.</div></div><div><h3>Results</h3><div>Sixty-eight AI reporting guidelines were included; 48.5% focused on general AI, whereas only 7.4% addressed GAI/LLMs. Methodological rigor was limited; 39.7% described development processes, 42.6% involved multidisciplinary experts, and 33.8% followed EQUATOR recommendations. Significant overlap existed, particularly in medical imaging (20.6% of guidelines). GAI-specific guidelines (14.7%) lacked comprehensive coverage and methodological transparency.</div></div><div><h3>Conclusion</h3><div>Existing AI reporting guidelines in medicine have suboptimal methodological rigor, redundancy, and insufficient coverage of GAI applications. Future and updated guidelines should prioritize standardized development processes, multidisciplinary collaboration, and expanded focus on emerging AI technologies like LLMs.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"186 ","pages":"Article 111903"},"PeriodicalIF":5.2000,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0895435625002367","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives
This study aimed to systematically map the development methods, scope, and limitations of existing artificial intelligence (AI) reporting guidelines in medicine and to explore their applicability to generative AI (GAI) tools, such as large language models (LLMs).
Study Design and Setting
We reported a scoping review adhering to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews. Five information sources were searched, including MEDLINE (via PubMed), Enhancing the QUAlity and Transparency Of health Research (EQUATOR) Network, China National Knowledge Infrastructure, FAIRsharing, and Google Scholar, from inception to December 31, 2024. Two reviewers independently screened records and extracted data using a predefined Excel template. Data included guideline characteristics (eg, development methods, target audience, AI domain), adherence to EQUATOR Network recommendations, and consensus methodologies. Discrepancies were resolved by a third reviewer.
Results
Sixty-eight AI reporting guidelines were included; 48.5% focused on general AI, whereas only 7.4% addressed GAI/LLMs. Methodological rigor was limited; 39.7% described development processes, 42.6% involved multidisciplinary experts, and 33.8% followed EQUATOR recommendations. Significant overlap existed, particularly in medical imaging (20.6% of guidelines). GAI-specific guidelines (14.7%) lacked comprehensive coverage and methodological transparency.
Conclusion
Existing AI reporting guidelines in medicine have suboptimal methodological rigor, redundancy, and insufficient coverage of GAI applications. Future and updated guidelines should prioritize standardized development processes, multidisciplinary collaboration, and expanded focus on emerging AI technologies like LLMs.
期刊介绍:
The Journal of Clinical Epidemiology strives to enhance the quality of clinical and patient-oriented healthcare research by advancing and applying innovative methods in conducting, presenting, synthesizing, disseminating, and translating research results into optimal clinical practice. Special emphasis is placed on training new generations of scientists and clinical practice leaders.