{"title":"The Case for Synthetic Images Generated by Artificial Intelligence","authors":"Bingwen Eugene Fan, Stefan Winkler","doi":"10.1002/ajh.70019","DOIUrl":null,"url":null,"abstract":"<p>The latest deep learning models boast powerful capabilities for data generation in various modalities, including most notably text and images. A recent editorial by Bucci and Parini [<span>1</span>] focused on the opportunities this creates for researchers acting in bad faith, using such tools for image falsification and scientific fraud. As authors cited in this context [<span>2</span>], we offer a necessary counterpoint guided by Melvin Kranzberg's first law: “Technology is neither good nor bad; nor is it neutral” [<span>3</span>]. There is no doubt that this new technology can lead to ethical challenges for users. The downsides, such as the easy production of “deep fake” images and videos, is clearly a worrying trend, posing ethical challenges. However, we must not forget the beneficial use cases of these generative tools with transformative scientific applications already advancing our field.</p><p>The development of powerful diffusion models—which reverse information loss through noise addition to generate images that are very similar in distribution to the original dataset they were trained with—represents a technical breakthrough in synthetic image generation. This approach can be considered part of a set of more general techniques, collectively termed <i>data augmentation</i>, used to enhance training datasets through various transformations in order to increase the quantity and diversity of training images. Basic transformations include geometric distortions, color adjustment, noise injection, filtering, and others. More advanced methods based on deep neural networks have also been developed, such as style transfer, super-resolution, or in-painting [<span>4</span>]. Synthetic image generation is a natural next step in this process. When ethically deployed, augmentation and generation methods can significantly improve machine learning performance, robustness, and generalization.</p><p>This is particularly vital in hematology, where three critical constraints converge: scarce protected patient data, limited rare-disease cohorts, and costly expert annotations. Here, synthetic images offer demonstrable solutions. Firstly, synthetic images of cells from bone marrow smears [<span>5</span>] and peripheral blood films [<span>2, 6</span>] enable cross-institutional collaboration without breaching patient confidentiality. Secondly, combining synthetic and real microscopic cell images enhances classification accuracy in diagnostics [<span>7</span>]. Lastly, augmented datasets reduce reliance on scarce annotated samples while improving model generalizability [<span>4</span>].</p><p>We agree that malevolent use demands governance—clear labeling, provenance documentation, and algorithmic transparency are essential. Generative models remain imperfect; ensuring synthetic images are realistic, diverse, and non-inferential requires ongoing refinement across modalities [<span>8</span>]. Yet safeguards are advancing: the Nature portfolio of journals state that “Editors may use software to screen images for manipulation…. Editors may request the unprocessed data files to help in manuscript evaluation during the peer review process…. (and) recommend retaining unprocessed data and metadata files after publication, ideally archiving data in perpetuity” [<span>9</span>]. Rather than energy-intensive blockchain solutions, we advocate cryptographic hashing in existing repositories for tamper-proof traceability. We support tamper-proof traceability mechanisms to ensure data integrity. While cryptographic hashing within existing repositories is a necessary component, it is not sufficient by itself to guarantee provenance or prevent post hoc manipulation. Secure, immutable, and independently verifiable registries—such as lightweight blockchain [<span>10</span>] or distributed ledger technologies—can additionally provide public auditability and trusted timestamping. Efficient, low energy blockchain systems designed for hash registration are already in use in scientific data certification, and their adoption in synthetic image workflows would bolster both transparency and trust.</p><p>In conclusion, scientists need to understand the ramifications of inconsiderate or even malevolent use of synthetically generated images. Governments, institutions journals, and regulatory bodies should provide clear frameworks and guidelines on what is appropriate, while actively promoting ethical applications that overcome data scarcity and privacy barriers. Where oversight for synthetic images is weak, it enables fraud; where governance is rigorous, it facilitates trust and progress. Clear labeling of such images, accurate description of data provenance, publicly auditable time stamping systems, as well as the open sourcing of datasets and algorithms become even more important in the face of these powerful new tools. Only through judicious stewardship can we ensure these tools fulfill their potential: not as instruments of deception, but as engines of discovery.</p><p>Bingwen Eugene Fan and Stefan Winkler contributed to the creation of the manuscript.</p><p>We declare no conflicts of interest. Bingwen Eugene Fan is supported by the National Medical Research Council (NMRC) Clinician Innovator Development Award (NMRC/CIDA19May-0004) and the NMRC Research Training Fellowship (RTF24jan-0017).</p><p>The authors have nothing to report.</p><p>The authors declare no conflicts of interest.</p>","PeriodicalId":7724,"journal":{"name":"American Journal of Hematology","volume":"100 10","pages":"1910-1911"},"PeriodicalIF":9.9000,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ajh.70019","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Hematology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ajh.70019","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEMATOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
The latest deep learning models boast powerful capabilities for data generation in various modalities, including most notably text and images. A recent editorial by Bucci and Parini [1] focused on the opportunities this creates for researchers acting in bad faith, using such tools for image falsification and scientific fraud. As authors cited in this context [2], we offer a necessary counterpoint guided by Melvin Kranzberg's first law: “Technology is neither good nor bad; nor is it neutral” [3]. There is no doubt that this new technology can lead to ethical challenges for users. The downsides, such as the easy production of “deep fake” images and videos, is clearly a worrying trend, posing ethical challenges. However, we must not forget the beneficial use cases of these generative tools with transformative scientific applications already advancing our field.
The development of powerful diffusion models—which reverse information loss through noise addition to generate images that are very similar in distribution to the original dataset they were trained with—represents a technical breakthrough in synthetic image generation. This approach can be considered part of a set of more general techniques, collectively termed data augmentation, used to enhance training datasets through various transformations in order to increase the quantity and diversity of training images. Basic transformations include geometric distortions, color adjustment, noise injection, filtering, and others. More advanced methods based on deep neural networks have also been developed, such as style transfer, super-resolution, or in-painting [4]. Synthetic image generation is a natural next step in this process. When ethically deployed, augmentation and generation methods can significantly improve machine learning performance, robustness, and generalization.
This is particularly vital in hematology, where three critical constraints converge: scarce protected patient data, limited rare-disease cohorts, and costly expert annotations. Here, synthetic images offer demonstrable solutions. Firstly, synthetic images of cells from bone marrow smears [5] and peripheral blood films [2, 6] enable cross-institutional collaboration without breaching patient confidentiality. Secondly, combining synthetic and real microscopic cell images enhances classification accuracy in diagnostics [7]. Lastly, augmented datasets reduce reliance on scarce annotated samples while improving model generalizability [4].
We agree that malevolent use demands governance—clear labeling, provenance documentation, and algorithmic transparency are essential. Generative models remain imperfect; ensuring synthetic images are realistic, diverse, and non-inferential requires ongoing refinement across modalities [8]. Yet safeguards are advancing: the Nature portfolio of journals state that “Editors may use software to screen images for manipulation…. Editors may request the unprocessed data files to help in manuscript evaluation during the peer review process…. (and) recommend retaining unprocessed data and metadata files after publication, ideally archiving data in perpetuity” [9]. Rather than energy-intensive blockchain solutions, we advocate cryptographic hashing in existing repositories for tamper-proof traceability. We support tamper-proof traceability mechanisms to ensure data integrity. While cryptographic hashing within existing repositories is a necessary component, it is not sufficient by itself to guarantee provenance or prevent post hoc manipulation. Secure, immutable, and independently verifiable registries—such as lightweight blockchain [10] or distributed ledger technologies—can additionally provide public auditability and trusted timestamping. Efficient, low energy blockchain systems designed for hash registration are already in use in scientific data certification, and their adoption in synthetic image workflows would bolster both transparency and trust.
In conclusion, scientists need to understand the ramifications of inconsiderate or even malevolent use of synthetically generated images. Governments, institutions journals, and regulatory bodies should provide clear frameworks and guidelines on what is appropriate, while actively promoting ethical applications that overcome data scarcity and privacy barriers. Where oversight for synthetic images is weak, it enables fraud; where governance is rigorous, it facilitates trust and progress. Clear labeling of such images, accurate description of data provenance, publicly auditable time stamping systems, as well as the open sourcing of datasets and algorithms become even more important in the face of these powerful new tools. Only through judicious stewardship can we ensure these tools fulfill their potential: not as instruments of deception, but as engines of discovery.
Bingwen Eugene Fan and Stefan Winkler contributed to the creation of the manuscript.
We declare no conflicts of interest. Bingwen Eugene Fan is supported by the National Medical Research Council (NMRC) Clinician Innovator Development Award (NMRC/CIDA19May-0004) and the NMRC Research Training Fellowship (RTF24jan-0017).
期刊介绍:
The American Journal of Hematology offers extensive coverage of experimental and clinical aspects of blood diseases in humans and animal models. The journal publishes original contributions in both non-malignant and malignant hematological diseases, encompassing clinical and basic studies in areas such as hemostasis, thrombosis, immunology, blood banking, and stem cell biology. Clinical translational reports highlighting innovative therapeutic approaches for the diagnosis and treatment of hematological diseases are actively encouraged.The American Journal of Hematology features regular original laboratory and clinical research articles, brief research reports, critical reviews, images in hematology, as well as letters and correspondence.