{"title":"AI-assisted statistical review of 100 oncology research articles: compliance with SAMPL guidelines","authors":"Michal Ordak","doi":"10.1016/j.retram.2025.103544","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div><strong>:</strong> Ensuring accurate statistical reporting is critical in oncology research, where data-driven conclusions impact clinical decision-making. Despite standardized guidelines such as the Statistical Analyses and Methods in the Published Literature (SAMPL), adherence remains inconsistent. This study evaluates the performance of Gemini Advanced 2.0 Flash, an AI model, in assessing compliance with SAMPL guidelines in oncology research articles.</div></div><div><h3>Methods</h3><div><strong>:</strong> A total of 100 original research articles published in four peer-reviewed oncology journals (October 2024–February 2025) were analyzed. Gemini Advanced 2.0 Flash assessed adherence to ten key SAMPL guidelines, categorizing each as \"not met,\" \"partially met,\" or \"fully met.\" AI evaluations were compared with independent assessments by a statistical editor, with agreement quantified using Cohen’s Kappa coefficient.</div></div><div><h3>Results</h3><div>The overall weighted Kappa coefficient was 0.77 (95 % CI: 0.6–0.94), indicating substantial agreement between AI and manual assessment. Full agreement (Kappa = 1) was found for four guidelines, including naming statistical packages and reporting confidence intervals. High agreement was observed for specifying statistical methods (Kappa = 0.85) and confirming test assumptions (Kappa = 0.75). Moderate agreement was noted for summarizing non-normally distributed data (Kappa = 0.42) and specifying test directionality (Kappa = 0.43). The lowest agreement (Kappa = 0.37) was observed in multiple comparison adjustments due to missing justifications for post hoc tests.</div></div><div><h3>Conclusion</h3><div><strong>:</strong> AI-assisted evaluation showed substantial agreement with expert assessment, demonstrating its potential in statistical review. However, discrepancies in specific guidelines suggest human oversight remains essential for ensuring statistical rigor in oncology research. Further refinement of AI models may enhance their reliability in scientific publishing.</div></div>","PeriodicalId":54260,"journal":{"name":"Current Research in Translational Medicine","volume":"73 4","pages":"Article 103544"},"PeriodicalIF":3.0000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Research in Translational Medicine","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2452318625000534","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background
: Ensuring accurate statistical reporting is critical in oncology research, where data-driven conclusions impact clinical decision-making. Despite standardized guidelines such as the Statistical Analyses and Methods in the Published Literature (SAMPL), adherence remains inconsistent. This study evaluates the performance of Gemini Advanced 2.0 Flash, an AI model, in assessing compliance with SAMPL guidelines in oncology research articles.
Methods
: A total of 100 original research articles published in four peer-reviewed oncology journals (October 2024–February 2025) were analyzed. Gemini Advanced 2.0 Flash assessed adherence to ten key SAMPL guidelines, categorizing each as "not met," "partially met," or "fully met." AI evaluations were compared with independent assessments by a statistical editor, with agreement quantified using Cohen’s Kappa coefficient.
Results
The overall weighted Kappa coefficient was 0.77 (95 % CI: 0.6–0.94), indicating substantial agreement between AI and manual assessment. Full agreement (Kappa = 1) was found for four guidelines, including naming statistical packages and reporting confidence intervals. High agreement was observed for specifying statistical methods (Kappa = 0.85) and confirming test assumptions (Kappa = 0.75). Moderate agreement was noted for summarizing non-normally distributed data (Kappa = 0.42) and specifying test directionality (Kappa = 0.43). The lowest agreement (Kappa = 0.37) was observed in multiple comparison adjustments due to missing justifications for post hoc tests.
Conclusion
: AI-assisted evaluation showed substantial agreement with expert assessment, demonstrating its potential in statistical review. However, discrepancies in specific guidelines suggest human oversight remains essential for ensuring statistical rigor in oncology research. Further refinement of AI models may enhance their reliability in scientific publishing.
期刊介绍:
Current Research in Translational Medicine is a peer-reviewed journal, publishing worldwide clinical and basic research in the field of hematology, immunology, infectiology, hematopoietic cell transplantation, and cellular and gene therapy. The journal considers for publication English-language editorials, original articles, reviews, and short reports including case-reports. Contributions are intended to draw attention to experimental medicine and translational research. Current Research in Translational Medicine periodically publishes thematic issues and is indexed in all major international databases (2017 Impact Factor is 1.9).
Core areas covered in Current Research in Translational Medicine are:
Hematology,
Immunology,
Infectiology,
Hematopoietic,
Cell Transplantation,
Cellular and Gene Therapy.