{"title":"Facial Aesthetics in Artificial Intelligence: First Investigation Comparing Results in a Generative AI Study.","authors":"Arsany Yassa, Arya Akhavan, Solina Ayad, Olivia Ayad, Anthony Colon, Ashley Ignatiuk","doi":"","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Patients undergoing facial plastic surgery are increasingly using artificial intelligence (AI) to visualize expected postoperative results. However, AI training models' variations and lack of proper surgical photography in training sets may result in inaccurate simulations and unrealistic patient expectations. This study aimed to determine if AI-generated images can deliver realistic expectations and be useful in a surgical context.</p><p><strong>Methods: </strong>The authors used AI platforms Midjourney (Midjourney, Inc), Leonardo (Canva), and Stable Diffusion (Stability AI) to generate otoplasty, genioplasty, rhinoplasty, and platysmaplasty images. Board-certified plastic surgeons and residents assessed these images based on 11 metrics that were grouped into 2 criteria: realism and clinical value. Analysis of variance and Tukey Honestly Significant Difference post-hoc analysis tests were used for data analysis.</p><p><strong>Results: </strong>Performance for each metric was reported as mean ± SD. Midjourney outperformed Stable Diffusion significantly in realism (3.57 ± 0.58 vs 2.90 ± 0.65; <i>P</i> < .01), while no significant differences in clinical value were observed between the AI models (<i>P</i> = .38). Leonardo outperformed Stable Diffusion significantly in size and volume accuracy (3.83 ± 0.24 vs 3.00 ± 0.36; <i>P</i> = .02). Stable Diffusion underperformed significantly in anatomical correctness, age simulation, and texture mapping (most <i>P</i> values were less than .01). All 3 AI models consistently underperformed in healing and scarring prediction. The uncanny valley effect was also observed by the evaluators.</p><p><strong>Conclusions: </strong>Certain AI models outperformed others in generating the images, with evaluator opinions varying on their realism and clinical value. Some images reasonably depicted the target area and the expected outcome; however, many images displayed inappropriate postsurgical outcomes or provoked the uncanny valley effect with their lack of realism. The authors stress the need for AI improvement to produce better pre- and postoperative images, and plan for further research comparing AI-generated visuals with actual surgical results.</p>","PeriodicalId":93993,"journal":{"name":"Eplasty","volume":"25 ","pages":"e13"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12257972/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Eplasty","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Patients undergoing facial plastic surgery are increasingly using artificial intelligence (AI) to visualize expected postoperative results. However, AI training models' variations and lack of proper surgical photography in training sets may result in inaccurate simulations and unrealistic patient expectations. This study aimed to determine if AI-generated images can deliver realistic expectations and be useful in a surgical context.
Methods: The authors used AI platforms Midjourney (Midjourney, Inc), Leonardo (Canva), and Stable Diffusion (Stability AI) to generate otoplasty, genioplasty, rhinoplasty, and platysmaplasty images. Board-certified plastic surgeons and residents assessed these images based on 11 metrics that were grouped into 2 criteria: realism and clinical value. Analysis of variance and Tukey Honestly Significant Difference post-hoc analysis tests were used for data analysis.
Results: Performance for each metric was reported as mean ± SD. Midjourney outperformed Stable Diffusion significantly in realism (3.57 ± 0.58 vs 2.90 ± 0.65; P < .01), while no significant differences in clinical value were observed between the AI models (P = .38). Leonardo outperformed Stable Diffusion significantly in size and volume accuracy (3.83 ± 0.24 vs 3.00 ± 0.36; P = .02). Stable Diffusion underperformed significantly in anatomical correctness, age simulation, and texture mapping (most P values were less than .01). All 3 AI models consistently underperformed in healing and scarring prediction. The uncanny valley effect was also observed by the evaluators.
Conclusions: Certain AI models outperformed others in generating the images, with evaluator opinions varying on their realism and clinical value. Some images reasonably depicted the target area and the expected outcome; however, many images displayed inappropriate postsurgical outcomes or provoked the uncanny valley effect with their lack of realism. The authors stress the need for AI improvement to produce better pre- and postoperative images, and plan for further research comparing AI-generated visuals with actual surgical results.