Bram van Dijk, Saif ul Islam, Jim Achterberg, Hafiz Muhammad Waseem, Parisis Gallos, Gregory Epiphaniou, Carsten Maple, Marcel Haas, Marco Spruit
{"title":"A Novel Taxonomy for Navigating and Classifying Synthetic Data in Healthcare Applications","authors":"Bram van Dijk, Saif ul Islam, Jim Achterberg, Hafiz Muhammad Waseem, Parisis Gallos, Gregory Epiphaniou, Carsten Maple, Marcel Haas, Marco Spruit","doi":"arxiv-2409.00701","DOIUrl":null,"url":null,"abstract":"Data-driven technologies have improved the efficiency, reliability and\neffectiveness of healthcare services, but come with an increasing demand for\ndata, which is challenging due to privacy-related constraints on sharing data\nin healthcare contexts. Synthetic data has recently gained popularity as\npotential solution, but in the flurry of current research it can be hard to\noversee its potential. This paper proposes a novel taxonomy of synthetic data\nin healthcare to navigate the landscape in terms of three main varieties. Data\nProportion comprises different ratios of synthetic data in a dataset and\nassociated pros and cons. Data Modality refers to the different data formats\namenable to synthesis and format-specific challenges. Data Transformation\nconcerns improving specific aspects of a dataset like its utility or privacy\nwith synthetic data. Our taxonomy aims to help researchers in the healthcare\ndomain interested in synthetic data to grasp what types of datasets, data\nmodalities, and transformations are possible with synthetic data, and where the\nchallenges and overlaps between the varieties lie.","PeriodicalId":501112,"journal":{"name":"arXiv - CS - Computers and Society","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computers and Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.00701","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Data-driven technologies have improved the efficiency, reliability and
effectiveness of healthcare services, but come with an increasing demand for
data, which is challenging due to privacy-related constraints on sharing data
in healthcare contexts. Synthetic data has recently gained popularity as
potential solution, but in the flurry of current research it can be hard to
oversee its potential. This paper proposes a novel taxonomy of synthetic data
in healthcare to navigate the landscape in terms of three main varieties. Data
Proportion comprises different ratios of synthetic data in a dataset and
associated pros and cons. Data Modality refers to the different data formats
amenable to synthesis and format-specific challenges. Data Transformation
concerns improving specific aspects of a dataset like its utility or privacy
with synthetic data. Our taxonomy aims to help researchers in the healthcare
domain interested in synthetic data to grasp what types of datasets, data
modalities, and transformations are possible with synthetic data, and where the
challenges and overlaps between the varieties lie.