Fabian Gröger , Ludovic Amruthalingam , Simone Lionetti , Alexander A. Navarini , Fabian Ille , Marc Pouly
{"title":"人工智能应用中应对医疗数据稀缺的综述和系统指南","authors":"Fabian Gröger , Ludovic Amruthalingam , Simone Lionetti , Alexander A. Navarini , Fabian Ille , Marc Pouly","doi":"10.1016/j.cmpbup.2025.100220","DOIUrl":null,"url":null,"abstract":"<div><div>Artificial intelligence has the potential to improve the scalability, objectivity, and precision of the overall healthcare system. Such improvements are possible due to the growth of medical databases and the progress of deep learning approaches, which enable automated analysis of both structured and unstructured data. While the overall size of medical datasets continues to increase, data scarcity remains problematic due to challenges in the medical domain, such as rare diseases, difficult and expensive annotation, and restricted population coverage. Machine learning models trained without appropriate measures to counteract this scarcity are often biased and unreliable in real-world settings. This paper will systematically examine the different challenges arising from medical data scarcity, their implications, and state-of-the-art mitigation approaches. It includes studies from the general machine learning community and describes how their findings translate to medical applications. This review is meant as a practical resource for researchers who want to develop reliable machine learning models for medical applications when data is scarce.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"8 ","pages":"Article 100220"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A review and systematic guide to counteracting medical data scarcity for AI applications\",\"authors\":\"Fabian Gröger , Ludovic Amruthalingam , Simone Lionetti , Alexander A. Navarini , Fabian Ille , Marc Pouly\",\"doi\":\"10.1016/j.cmpbup.2025.100220\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Artificial intelligence has the potential to improve the scalability, objectivity, and precision of the overall healthcare system. Such improvements are possible due to the growth of medical databases and the progress of deep learning approaches, which enable automated analysis of both structured and unstructured data. While the overall size of medical datasets continues to increase, data scarcity remains problematic due to challenges in the medical domain, such as rare diseases, difficult and expensive annotation, and restricted population coverage. Machine learning models trained without appropriate measures to counteract this scarcity are often biased and unreliable in real-world settings. This paper will systematically examine the different challenges arising from medical data scarcity, their implications, and state-of-the-art mitigation approaches. It includes studies from the general machine learning community and describes how their findings translate to medical applications. This review is meant as a practical resource for researchers who want to develop reliable machine learning models for medical applications when data is scarce.</div></div>\",\"PeriodicalId\":72670,\"journal\":{\"name\":\"Computer methods and programs in biomedicine update\",\"volume\":\"8 \",\"pages\":\"Article 100220\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer methods and programs in biomedicine update\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S266699002500045X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine update","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S266699002500045X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A review and systematic guide to counteracting medical data scarcity for AI applications
Artificial intelligence has the potential to improve the scalability, objectivity, and precision of the overall healthcare system. Such improvements are possible due to the growth of medical databases and the progress of deep learning approaches, which enable automated analysis of both structured and unstructured data. While the overall size of medical datasets continues to increase, data scarcity remains problematic due to challenges in the medical domain, such as rare diseases, difficult and expensive annotation, and restricted population coverage. Machine learning models trained without appropriate measures to counteract this scarcity are often biased and unreliable in real-world settings. This paper will systematically examine the different challenges arising from medical data scarcity, their implications, and state-of-the-art mitigation approaches. It includes studies from the general machine learning community and describes how their findings translate to medical applications. This review is meant as a practical resource for researchers who want to develop reliable machine learning models for medical applications when data is scarce.