Tackling the small data problem in medical image classification with artificial intelligence: a systematic review

Progress in Biomedical Engineering Pub Date : 2024-06-16 DOI:10.1088/2516-1091/ad525b

Stefano Piffer, Leonardo Ubaldi, Sabina Tangaro, Alessandra Retico and Cinzia Talamonti

{"title":"Tackling the small data problem in medical image classification with artificial intelligence: a systematic review","authors":"Stefano Piffer, Leonardo Ubaldi, Sabina Tangaro, Alessandra Retico and Cinzia Talamonti","doi":"10.1088/2516-1091/ad525b","DOIUrl":null,"url":null,"abstract":"Though medical imaging has seen a growing interest in AI research, training models require a large amount of data. In this domain, there are limited sets of data available as collecting new data is either not feasible or requires burdensome resources. Researchers are facing with the problem of small datasets and have to apply tricks to fight overfitting. 147 peer-reviewed articles were retrieved from PubMed, published in English, up until 31 July 2022 and articles were assessed by two independent reviewers. We followed the Preferred Reporting Items for Systematic reviews and Meta-Analyse (PRISMA) guidelines for the paper selection and 77 studies were regarded as eligible for the scope of this review. Adherence to reporting standards was assessed by using TRIPOD statement (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis). To solve the small data issue transfer learning technique, basic data augmentation and generative adversarial network were applied in 75%, 69% and 14% of cases, respectively. More than 60% of the authors performed a binary classification given the data scarcity and the difficulty of the tasks. Concerning generalizability, only four studies explicitly stated an external validation of the developed model was carried out. Full access to all datasets and code was severely limited (unavailable in more than 80% of studies). Adherence to reporting standards was suboptimal (<50% adherence for 13 of 37 TRIPOD items). The goal of this review is to provide a comprehensive survey of recent advancements in dealing with small medical images samples size. Transparency and improve quality in publications as well as follow existing reporting standards are also supported.","PeriodicalId":501097,"journal":{"name":"Progress in Biomedical Engineering","volume":"29 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Progress in Biomedical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/2516-1091/ad525b","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Though medical imaging has seen a growing interest in AI research, training models require a large amount of data. In this domain, there are limited sets of data available as collecting new data is either not feasible or requires burdensome resources. Researchers are facing with the problem of small datasets and have to apply tricks to fight overfitting. 147 peer-reviewed articles were retrieved from PubMed, published in English, up until 31 July 2022 and articles were assessed by two independent reviewers. We followed the Preferred Reporting Items for Systematic reviews and Meta-Analyse (PRISMA) guidelines for the paper selection and 77 studies were regarded as eligible for the scope of this review. Adherence to reporting standards was assessed by using TRIPOD statement (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis). To solve the small data issue transfer learning technique, basic data augmentation and generative adversarial network were applied in 75%, 69% and 14% of cases, respectively. More than 60% of the authors performed a binary classification given the data scarcity and the difficulty of the tasks. Concerning generalizability, only four studies explicitly stated an external validation of the developed model was carried out. Full access to all datasets and code was severely limited (unavailable in more than 80% of studies). Adherence to reporting standards was suboptimal (<50% adherence for 13 of 37 TRIPOD items). The goal of this review is to provide a comprehensive survey of recent advancements in dealing with small medical images samples size. Transparency and improve quality in publications as well as follow existing reporting standards are also supported.

查看原文本刊更多论文

用人工智能解决医学影像分类中的小数据问题：系统综述

虽然医学影像领域对人工智能研究的兴趣与日俱增，但训练模型需要大量数据。在这一领域，可用的数据集有限，因为收集新数据要么不可行，要么需要耗费大量资源。研究人员面临着数据集较小的问题，不得不采用一些技巧来对抗过度拟合。我们从 PubMed 上检索了截至 2022 年 7 月 31 日以英文发表的 147 篇同行评审文章，并由两名独立评审员对文章进行了评估。我们遵循系统综述和元分析首选报告项目（PRISMA）指南进行论文筛选，77 项研究被认为符合本综述的范围。我们使用 TRIPOD 声明（针对个体预后或诊断的多变量预测模型的透明报告）对报告标准的遵守情况进行了评估。为解决小数据问题，分别有75%、69%和14%的病例采用了迁移学习技术、基本数据增强技术和生成对抗网络。考虑到数据的稀缺性和任务的难度，60% 以上的作者进行了二元分类。关于可推广性，只有四项研究明确指出对所开发的模型进行了外部验证。对所有数据集和代码的全面访问受到严重限制（超过 80% 的研究无法获得）。对报告标准的遵守情况不尽如人意（37 个 TRIPOD 项目中有 13 个项目的遵守率低于 50%）。本综述的目的是对处理小样本医学影像的最新进展进行全面调查。同时还支持提高出版物的透明度和质量，以及遵循现有的报告标准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Progress in Biomedical Engineering

自引率

0.00%

发文量