Facial Aesthetics in Artificial Intelligence: First Investigation Comparing Results in a Generative AI Study.

Eplasty Pub Date : 2025-04-21 eCollection Date: 2025-01-01

Arsany Yassa, Arya Akhavan, Solina Ayad, Olivia Ayad, Anthony Colon, Ashley Ignatiuk

{"title":"Facial Aesthetics in Artificial Intelligence: First Investigation Comparing Results in a Generative AI Study.","authors":"Arsany Yassa, Arya Akhavan, Solina Ayad, Olivia Ayad, Anthony Colon, Ashley Ignatiuk","doi":"","DOIUrl":null,"url":null,"abstract":"Background: Patients undergoing facial plastic surgery are increasingly using artificial intelligence (AI) to visualize expected postoperative results. However, AI training models' variations and lack of proper surgical photography in training sets may result in inaccurate simulations and unrealistic patient expectations. This study aimed to determine if AI-generated images can deliver realistic expectations and be useful in a surgical context.Methods: The authors used AI platforms Midjourney (Midjourney, Inc), Leonardo (Canva), and Stable Diffusion (Stability AI) to generate otoplasty, genioplasty, rhinoplasty, and platysmaplasty images. Board-certified plastic surgeons and residents assessed these images based on 11 metrics that were grouped into 2 criteria: realism and clinical value. Analysis of variance and Tukey Honestly Significant Difference post-hoc analysis tests were used for data analysis.Results: Performance for each metric was reported as mean ± SD. Midjourney outperformed Stable Diffusion significantly in realism (3.57 ± 0.58 vs 2.90 ± 0.65; P < .01), while no significant differences in clinical value were observed between the AI models (P = .38). Leonardo outperformed Stable Diffusion significantly in size and volume accuracy (3.83 ± 0.24 vs 3.00 ± 0.36; P = .02). Stable Diffusion underperformed significantly in anatomical correctness, age simulation, and texture mapping (most P values were less than .01). All 3 AI models consistently underperformed in healing and scarring prediction. The uncanny valley effect was also observed by the evaluators.Conclusions: Certain AI models outperformed others in generating the images, with evaluator opinions varying on their realism and clinical value. Some images reasonably depicted the target area and the expected outcome; however, many images displayed inappropriate postsurgical outcomes or provoked the uncanny valley effect with their lack of realism. The authors stress the need for AI improvement to produce better pre- and postoperative images, and plan for further research comparing AI-generated visuals with actual surgical results.","PeriodicalId":93993,"journal":{"name":"Eplasty","volume":"25 ","pages":"e13"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12257972/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Eplasty","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Patients undergoing facial plastic surgery are increasingly using artificial intelligence (AI) to visualize expected postoperative results. However, AI training models' variations and lack of proper surgical photography in training sets may result in inaccurate simulations and unrealistic patient expectations. This study aimed to determine if AI-generated images can deliver realistic expectations and be useful in a surgical context.

Methods: The authors used AI platforms Midjourney (Midjourney, Inc), Leonardo (Canva), and Stable Diffusion (Stability AI) to generate otoplasty, genioplasty, rhinoplasty, and platysmaplasty images. Board-certified plastic surgeons and residents assessed these images based on 11 metrics that were grouped into 2 criteria: realism and clinical value. Analysis of variance and Tukey Honestly Significant Difference post-hoc analysis tests were used for data analysis.

Results: Performance for each metric was reported as mean ± SD. Midjourney outperformed Stable Diffusion significantly in realism (3.57 ± 0.58 vs 2.90 ± 0.65; P < .01), while no significant differences in clinical value were observed between the AI models (P = .38). Leonardo outperformed Stable Diffusion significantly in size and volume accuracy (3.83 ± 0.24 vs 3.00 ± 0.36; P = .02). Stable Diffusion underperformed significantly in anatomical correctness, age simulation, and texture mapping (most P values were less than .01). All 3 AI models consistently underperformed in healing and scarring prediction. The uncanny valley effect was also observed by the evaluators.

Conclusions: Certain AI models outperformed others in generating the images, with evaluator opinions varying on their realism and clinical value. Some images reasonably depicted the target area and the expected outcome; however, many images displayed inappropriate postsurgical outcomes or provoked the uncanny valley effect with their lack of realism. The authors stress the need for AI improvement to produce better pre- and postoperative images, and plan for further research comparing AI-generated visuals with actual surgical results.

本刊更多论文

人工智能中的面部美学：第一次对生成式人工智能研究结果的调查比较。

背景：接受面部整形手术的患者越来越多地使用人工智能（AI）来可视化预期的术后结果。然而，人工智能训练模型的变化和训练集中缺乏适当的手术摄影可能导致不准确的模拟和不切实际的患者期望。这项研究旨在确定人工智能生成的图像是否能达到现实的预期，并在手术环境中有用。方法：作者使用人工智能平台Midjourney （Midjourney, Inc）、Leonardo （Canva）和Stable Diffusion （Stability AI）生成耳成形术、颏成形术、鼻成形术和平台成形术图像。委员会认证的整形外科医生和住院医师根据11项指标评估这些图像，这些指标分为两个标准：真实性和临床价值。数据分析采用方差分析和Tukey honest显著性差异事后分析检验。结果：每个指标的表现以mean±SD报告。中程在真实感方面明显优于稳定扩散(3.57±0.58 vs 2.90±0.65；P < 0.01)，而AI模型之间的临床价值无显著差异（P = .38）。Leonardo在尺寸和体积精度上明显优于Stable Diffusion(3.83±0.24 vs 3.00±0.36；P = .02)。Stable Diffusion在解剖正确性、年龄模拟和纹理映射方面表现不佳（大多数P值小于0.01）。这三种人工智能模型在愈合和疤痕预测方面一直表现不佳。评估者也观察到恐怖谷效应。结论：某些人工智能模型在生成图像方面优于其他模型，评估者对其真实感和临床价值的看法不一。一些图像合理地描绘了目标区域和预期结果；然而，许多图像显示不合适的术后结果或引发恐怖谷效应，因为它们缺乏真实性。作者强调需要改进人工智能以产生更好的术前和术后图像，并计划进一步研究将人工智能生成的视觉效果与实际手术结果进行比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Eplasty

自引率

0.00%

发文量