Xinyuan Wu,Lili Wang,Ruoyu Chen,Bowen Liu,Weiyi Zhang,Xi Yang,Yifan Feng,Mingguang He,Danli Shi
{"title":"Generation of Fundus Fluorescein Angiography Videos for Health Care Data Sharing.","authors":"Xinyuan Wu,Lili Wang,Ruoyu Chen,Bowen Liu,Weiyi Zhang,Xi Yang,Yifan Feng,Mingguang He,Danli Shi","doi":"10.1001/jamaophthalmol.2025.1419","DOIUrl":null,"url":null,"abstract":"Importance\r\nMedical data sharing faces strict restrictions. Text-to-video generation shows potential for creating realistic medical data while preserving privacy, offering a solution for cross-center data sharing and medical education.\r\n\r\nObjective\r\nTo develop and evaluate a text-to-video generative artificial intelligence (AI)-driven model that converts the text of reports into dynamic fundus fluorescein angiography (FFA) videos, enabling visualization of retinal vascular and structural abnormalities.\r\n\r\nDesign, Setting, and Participants\r\nThis study retrospectively collected anonymized FFA data from a tertiary hospital in China. The dataset included both the medical records and FFA examinations of patients assessed between November 2016 and December 2019. A text-to-video model was developed and evaluated. The AI-driven model integrated the wavelet-flow variational autoencoder and the diffusion transformer.\r\n\r\nMain Outcomes and Measures\r\nThe AI-driven model's performance was assessed through objective metrics (Fréchet video distance, learned perceptual image patch similarity score, and visual question answering score [VQAScore]). The domain-specific evaluation for the generated FFA videos was measured by the bidirectional encoder representations from transformers score (BERTScore). Image retrieval was evaluated using a Recall@K score. Each video was rated for quality by 3 ophthalmologists on a scale of 1 (excellent) to 5 (very poor).\r\n\r\nResults\r\nA total of 3625 FFA videos were included (2851 videos [78.6%] for training, 387 videos [10.7%] for validation, and 387 videos [10.7%] for testing). The AI-generated FFA videos demonstrated retinal abnormalities from the input text (Fréchet video distance of 2273, a mean learned perceptual image patch similarity score of 0.48 [SD, 0.04], and a mean VQAScore of 0.61 [SD, 0.08]). The domain-specific evaluations showed alignment between the generated videos and textual prompts (mean BERTScore, 0.35 [SD, 0.09]). The Recall@K scores were 0.02 for K = 5, 0.04 for K = 10, and 0.16 for K = 50, yielding a mean score of 0.073, reflecting disparities between AI-generated and real clinical videos and demonstrating privacy-preserving effectiveness. For assessment of visual quality of the FFA videos by the 3 ophthalmologists, the mean score was 1.57 (SD, 0.44).\r\n\r\nConclusions and Relevance\r\nThis study demonstrated that an AI-driven text-to-video model generated FFA videos from textual descriptions, potentially improving visualization for clinical and educational purposes. The privacy-preserving nature of the model may address key challenges in data sharing while trying to ensure compliance with confidentiality standards.","PeriodicalId":14518,"journal":{"name":"JAMA ophthalmology","volume":"148 1","pages":""},"PeriodicalIF":7.8000,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMA ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1001/jamaophthalmol.2025.1419","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Importance
Medical data sharing faces strict restrictions. Text-to-video generation shows potential for creating realistic medical data while preserving privacy, offering a solution for cross-center data sharing and medical education.
Objective
To develop and evaluate a text-to-video generative artificial intelligence (AI)-driven model that converts the text of reports into dynamic fundus fluorescein angiography (FFA) videos, enabling visualization of retinal vascular and structural abnormalities.
Design, Setting, and Participants
This study retrospectively collected anonymized FFA data from a tertiary hospital in China. The dataset included both the medical records and FFA examinations of patients assessed between November 2016 and December 2019. A text-to-video model was developed and evaluated. The AI-driven model integrated the wavelet-flow variational autoencoder and the diffusion transformer.
Main Outcomes and Measures
The AI-driven model's performance was assessed through objective metrics (Fréchet video distance, learned perceptual image patch similarity score, and visual question answering score [VQAScore]). The domain-specific evaluation for the generated FFA videos was measured by the bidirectional encoder representations from transformers score (BERTScore). Image retrieval was evaluated using a Recall@K score. Each video was rated for quality by 3 ophthalmologists on a scale of 1 (excellent) to 5 (very poor).
Results
A total of 3625 FFA videos were included (2851 videos [78.6%] for training, 387 videos [10.7%] for validation, and 387 videos [10.7%] for testing). The AI-generated FFA videos demonstrated retinal abnormalities from the input text (Fréchet video distance of 2273, a mean learned perceptual image patch similarity score of 0.48 [SD, 0.04], and a mean VQAScore of 0.61 [SD, 0.08]). The domain-specific evaluations showed alignment between the generated videos and textual prompts (mean BERTScore, 0.35 [SD, 0.09]). The Recall@K scores were 0.02 for K = 5, 0.04 for K = 10, and 0.16 for K = 50, yielding a mean score of 0.073, reflecting disparities between AI-generated and real clinical videos and demonstrating privacy-preserving effectiveness. For assessment of visual quality of the FFA videos by the 3 ophthalmologists, the mean score was 1.57 (SD, 0.44).
Conclusions and Relevance
This study demonstrated that an AI-driven text-to-video model generated FFA videos from textual descriptions, potentially improving visualization for clinical and educational purposes. The privacy-preserving nature of the model may address key challenges in data sharing while trying to ensure compliance with confidentiality standards.
期刊介绍:
JAMA Ophthalmology, with a rich history of continuous publication since 1869, stands as a distinguished international, peer-reviewed journal dedicated to ophthalmology and visual science. In 2019, the journal proudly commemorated 150 years of uninterrupted service to the field. As a member of the esteemed JAMA Network, a consortium renowned for its peer-reviewed general medical and specialty publications, JAMA Ophthalmology upholds the highest standards of excellence in disseminating cutting-edge research and insights. Join us in celebrating our legacy and advancing the frontiers of ophthalmology and visual science.