Tackling the small data problem in medical image classification with artificial intelligence: a systematic review

Stefano Piffer, Leonardo Ubaldi, Sabina Tangaro, Alessandra Retico and Cinzia Talamonti
{"title":"Tackling the small data problem in medical image classification with artificial intelligence: a systematic review","authors":"Stefano Piffer, Leonardo Ubaldi, Sabina Tangaro, Alessandra Retico and Cinzia Talamonti","doi":"10.1088/2516-1091/ad525b","DOIUrl":null,"url":null,"abstract":"Though medical imaging has seen a growing interest in AI research, training models require a large amount of data. In this domain, there are limited sets of data available as collecting new data is either not feasible or requires burdensome resources. Researchers are facing with the problem of small datasets and have to apply tricks to fight overfitting. 147 peer-reviewed articles were retrieved from PubMed, published in English, up until 31 July 2022 and articles were assessed by two independent reviewers. We followed the Preferred Reporting Items for Systematic reviews and Meta-Analyse (PRISMA) guidelines for the paper selection and 77 studies were regarded as eligible for the scope of this review. Adherence to reporting standards was assessed by using TRIPOD statement (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis). To solve the small data issue transfer learning technique, basic data augmentation and generative adversarial network were applied in 75%, 69% and 14% of cases, respectively. More than 60% of the authors performed a binary classification given the data scarcity and the difficulty of the tasks. Concerning generalizability, only four studies explicitly stated an external validation of the developed model was carried out. Full access to all datasets and code was severely limited (unavailable in more than 80% of studies). Adherence to reporting standards was suboptimal (<50% adherence for 13 of 37 TRIPOD items). The goal of this review is to provide a comprehensive survey of recent advancements in dealing with small medical images samples size. Transparency and improve quality in publications as well as follow existing reporting standards are also supported.","PeriodicalId":501097,"journal":{"name":"Progress in Biomedical Engineering","volume":"29 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Progress in Biomedical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/2516-1091/ad525b","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Though medical imaging has seen a growing interest in AI research, training models require a large amount of data. In this domain, there are limited sets of data available as collecting new data is either not feasible or requires burdensome resources. Researchers are facing with the problem of small datasets and have to apply tricks to fight overfitting. 147 peer-reviewed articles were retrieved from PubMed, published in English, up until 31 July 2022 and articles were assessed by two independent reviewers. We followed the Preferred Reporting Items for Systematic reviews and Meta-Analyse (PRISMA) guidelines for the paper selection and 77 studies were regarded as eligible for the scope of this review. Adherence to reporting standards was assessed by using TRIPOD statement (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis). To solve the small data issue transfer learning technique, basic data augmentation and generative adversarial network were applied in 75%, 69% and 14% of cases, respectively. More than 60% of the authors performed a binary classification given the data scarcity and the difficulty of the tasks. Concerning generalizability, only four studies explicitly stated an external validation of the developed model was carried out. Full access to all datasets and code was severely limited (unavailable in more than 80% of studies). Adherence to reporting standards was suboptimal (<50% adherence for 13 of 37 TRIPOD items). The goal of this review is to provide a comprehensive survey of recent advancements in dealing with small medical images samples size. Transparency and improve quality in publications as well as follow existing reporting standards are also supported.
用人工智能解决医学影像分类中的小数据问题:系统综述
虽然医学影像领域对人工智能研究的兴趣与日俱增,但训练模型需要大量数据。在这一领域,可用的数据集有限,因为收集新数据要么不可行,要么需要耗费大量资源。研究人员面临着数据集较小的问题,不得不采用一些技巧来对抗过度拟合。我们从 PubMed 上检索了截至 2022 年 7 月 31 日以英文发表的 147 篇同行评审文章,并由两名独立评审员对文章进行了评估。我们遵循系统综述和元分析首选报告项目(PRISMA)指南进行论文筛选,77 项研究被认为符合本综述的范围。我们使用 TRIPOD 声明(针对个体预后或诊断的多变量预测模型的透明报告)对报告标准的遵守情况进行了评估。为解决小数据问题,分别有75%、69%和14%的病例采用了迁移学习技术、基本数据增强技术和生成对抗网络。考虑到数据的稀缺性和任务的难度,60% 以上的作者进行了二元分类。关于可推广性,只有四项研究明确指出对所开发的模型进行了外部验证。对所有数据集和代码的全面访问受到严重限制(超过 80% 的研究无法获得)。对报告标准的遵守情况不尽如人意(37 个 TRIPOD 项目中有 13 个项目的遵守率低于 50%)。本综述的目的是对处理小样本医学影像的最新进展进行全面调查。同时还支持提高出版物的透明度和质量,以及遵循现有的报告标准。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信