{"title":"Generative Artificial Intelligence Enhancements for Reducing Image-based Training Data Requirements","authors":"Dake Chen PhD , Ying Han MD, PhD , Jacque Duncan MD , Lin Jia PhD , Jing Shan MD, PhD","doi":"10.1016/j.xops.2024.100531","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><p>Training data fuel and shape the development of artificial intelligence (AI) models. Intensive data requirements are a major bottleneck limiting the success of AI tools in sectors with inherently scarce data. In health care, training data are difficult to curate, triggering growing concerns that the current lack of access to health care by under-privileged social groups will translate into future bias in health care AIs. In this report, we developed an autoencoder to grow and enhance inherently scarce datasets to alleviate our dependence on big data.</p></div><div><h3>Design</h3><p>Computational study with open-source data.</p></div><div><h3>Subjects</h3><p>The data were obtained from 6 open-source datasets comprising patients aged 40–80 years in Singapore, China, India, and Spain.</p></div><div><h3>Methods</h3><p>The reported framework generates synthetic images based on real-world patient imaging data. As a test case, we used autoencoder to expand publicly available training sets of optic disc photos, and evaluated the ability of the resultant datasets to train AI models in the detection of glaucomatous optic neuropathy.</p></div><div><h3>Main Outcome Measures</h3><p>Area under the receiver operating characteristic curve (AUC) were used to evaluate the performance of the glaucoma detector. A higher AUC indicates better detection performance.</p></div><div><h3>Results</h3><p>Results show that enhancing datasets with synthetic images generated by autoencoder led to superior training sets that improved the performance of AI models.</p></div><div><h3>Conclusions</h3><p>Our findings here help address the increasingly untenable data volume and quality requirements for AI model development and have implications beyond health care, toward empowering AI adoption for all similarly data-challenged fields.</p></div><div><h3>Financial Disclosure(s)</h3><p>The authors have no proprietary or commercial interest in any materials discussed in this article.</p></div>","PeriodicalId":74363,"journal":{"name":"Ophthalmology science","volume":"4 5","pages":"Article 100531"},"PeriodicalIF":3.2000,"publicationDate":"2024-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666914524000678/pdfft?md5=c185f42a34115a97df571fc008ff4be2&pid=1-s2.0-S2666914524000678-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ophthalmology science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666914524000678","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective
Training data fuel and shape the development of artificial intelligence (AI) models. Intensive data requirements are a major bottleneck limiting the success of AI tools in sectors with inherently scarce data. In health care, training data are difficult to curate, triggering growing concerns that the current lack of access to health care by under-privileged social groups will translate into future bias in health care AIs. In this report, we developed an autoencoder to grow and enhance inherently scarce datasets to alleviate our dependence on big data.
Design
Computational study with open-source data.
Subjects
The data were obtained from 6 open-source datasets comprising patients aged 40–80 years in Singapore, China, India, and Spain.
Methods
The reported framework generates synthetic images based on real-world patient imaging data. As a test case, we used autoencoder to expand publicly available training sets of optic disc photos, and evaluated the ability of the resultant datasets to train AI models in the detection of glaucomatous optic neuropathy.
Main Outcome Measures
Area under the receiver operating characteristic curve (AUC) were used to evaluate the performance of the glaucoma detector. A higher AUC indicates better detection performance.
Results
Results show that enhancing datasets with synthetic images generated by autoencoder led to superior training sets that improved the performance of AI models.
Conclusions
Our findings here help address the increasingly untenable data volume and quality requirements for AI model development and have implications beyond health care, toward empowering AI adoption for all similarly data-challenged fields.
Financial Disclosure(s)
The authors have no proprietary or commercial interest in any materials discussed in this article.