Boris van Breugel, Tennison Liu, Dino Oglic, Mihaela van der Schaar
{"title":"基于生成人工智能的生物医学合成数据","authors":"Boris van Breugel, Tennison Liu, Dino Oglic, Mihaela van der Schaar","doi":"10.1038/s44222-024-00245-7","DOIUrl":null,"url":null,"abstract":"The creation and application of data in biomedicine and healthcare often face privacy constraints, bias, distributional shifts, underrepresentation of certain groups and data scarcity. Some of these challenges may be addressed by synthetic data, which can be generated by deep generative models. In this Review, we highlight how data-driven synthetic data can be created not only to overcome privacy concerns associated with real data, but also to expand and improve real data. In particular, generative-model-based data augmentation can address data scarcity; synthetic data can improve data fairness and reduce bias by accounting for underrepresented groups; and unseen scenarios may be simulated with synthetic data. We further examine how biomedically relevant data, such as molecular, imaging and tabular data, may be created by foundation models through query-specific generation. We outline the challenges associated with ownership, publication, sharing and access of synthetic data. Importantly, we discuss approaches that can be applied to measure the quality of data generated by deep generative models to improve trust in synthetic data and the results derived from such data. Synthetic data can be created by deep generative models to address challenges associated with real data, such as privacy issues, bias and data scarcity. This Review discusses the generation and application of synthetic data in biomedicine and bioengineering, including quality assessment and validation.","PeriodicalId":74248,"journal":{"name":"Nature reviews bioengineering","volume":"2 12","pages":"991-1004"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Synthetic data in biomedicine via generative artificial intelligence\",\"authors\":\"Boris van Breugel, Tennison Liu, Dino Oglic, Mihaela van der Schaar\",\"doi\":\"10.1038/s44222-024-00245-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The creation and application of data in biomedicine and healthcare often face privacy constraints, bias, distributional shifts, underrepresentation of certain groups and data scarcity. Some of these challenges may be addressed by synthetic data, which can be generated by deep generative models. In this Review, we highlight how data-driven synthetic data can be created not only to overcome privacy concerns associated with real data, but also to expand and improve real data. In particular, generative-model-based data augmentation can address data scarcity; synthetic data can improve data fairness and reduce bias by accounting for underrepresented groups; and unseen scenarios may be simulated with synthetic data. We further examine how biomedically relevant data, such as molecular, imaging and tabular data, may be created by foundation models through query-specific generation. We outline the challenges associated with ownership, publication, sharing and access of synthetic data. Importantly, we discuss approaches that can be applied to measure the quality of data generated by deep generative models to improve trust in synthetic data and the results derived from such data. Synthetic data can be created by deep generative models to address challenges associated with real data, such as privacy issues, bias and data scarcity. This Review discusses the generation and application of synthetic data in biomedicine and bioengineering, including quality assessment and validation.\",\"PeriodicalId\":74248,\"journal\":{\"name\":\"Nature reviews bioengineering\",\"volume\":\"2 12\",\"pages\":\"991-1004\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nature reviews bioengineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.nature.com/articles/s44222-024-00245-7\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature reviews bioengineering","FirstCategoryId":"1085","ListUrlMain":"https://www.nature.com/articles/s44222-024-00245-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Synthetic data in biomedicine via generative artificial intelligence
The creation and application of data in biomedicine and healthcare often face privacy constraints, bias, distributional shifts, underrepresentation of certain groups and data scarcity. Some of these challenges may be addressed by synthetic data, which can be generated by deep generative models. In this Review, we highlight how data-driven synthetic data can be created not only to overcome privacy concerns associated with real data, but also to expand and improve real data. In particular, generative-model-based data augmentation can address data scarcity; synthetic data can improve data fairness and reduce bias by accounting for underrepresented groups; and unseen scenarios may be simulated with synthetic data. We further examine how biomedically relevant data, such as molecular, imaging and tabular data, may be created by foundation models through query-specific generation. We outline the challenges associated with ownership, publication, sharing and access of synthetic data. Importantly, we discuss approaches that can be applied to measure the quality of data generated by deep generative models to improve trust in synthetic data and the results derived from such data. Synthetic data can be created by deep generative models to address challenges associated with real data, such as privacy issues, bias and data scarcity. This Review discusses the generation and application of synthetic data in biomedicine and bioengineering, including quality assessment and validation.