{"title":"Initial Proof-of-Concept Study for a Plastic Surgery Specific Artificial Intelligence Large Language Model: PlasticSurgeryGPT.","authors":"Berk B Ozmen, Ibrahim Berber, Graham S Schwarz","doi":"10.1093/asj/sjaf049","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The advent of general-purpose large language models (LLMs) like ChatGPT (OpenAI, San Francisco, CA) has revolutionized natural language processing, but their applicability in specialized medical fields like plastic surgery remains limited due to a lack of domain-specific knowledge.</p><p><strong>Objectives: </strong>This study aims to develop and evaluate PlasticSurgeryGPT, a dedicated LLM fine-tuned on plastic surgery literature, to enhance performance in clinical decision support, surgical education, and research within the field.</p><p><strong>Methods: </strong>A comprehensive dataset of 25,389 plastic surgery research abstracts published between January 1, 2010, and January 1, 2024, was retrieved from PubMed. The abstracts underwent rigorous preprocessing, including text cleaning and tokenization. We fine-tuned the pre-trained GPT-2 model on this dataset using the PyTorch and HuggingFace frameworks. The performance of PlasticSurgeryGPT was evaluated against the default GPT-2 model using BLEU, METEOR, and ROUGE-1 metrics.</p><p><strong>Results: </strong>The fine-tuned model, named PlasticSurgeryGPT, demonstrated substantial improvements over the generic GPT-2 model in capturing the semantic nuances of plastic surgery text. PlasticSurgeryGPT outperformed GPT-2 across BLEU, METEOR, and ROUGE-1 metrics, with scores of 0.135519, 0.583554, and 0.216813, respectively, compared to GPT-2's scores of 0.130179, 0.550498, and 0.215494.</p><p><strong>Conclusions: </strong>PlasticSurgeryGPT represents the first plastic surgery-specific LLM, demonstrating enhanced performance in generating relevant and accurate content compared to a general-purpose model. This work underscores the potential of domain-specific LLMs in improving clinical practice, surgical education, and research in plastic surgery. Future studies should focus on incorporating full-text articles, multimodal data, and larger models to further enhance performance and applicability.</p>","PeriodicalId":7728,"journal":{"name":"Aesthetic Surgery Journal","volume":" ","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Aesthetic Surgery Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/asj/sjaf049","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The advent of general-purpose large language models (LLMs) like ChatGPT (OpenAI, San Francisco, CA) has revolutionized natural language processing, but their applicability in specialized medical fields like plastic surgery remains limited due to a lack of domain-specific knowledge.
Objectives: This study aims to develop and evaluate PlasticSurgeryGPT, a dedicated LLM fine-tuned on plastic surgery literature, to enhance performance in clinical decision support, surgical education, and research within the field.
Methods: A comprehensive dataset of 25,389 plastic surgery research abstracts published between January 1, 2010, and January 1, 2024, was retrieved from PubMed. The abstracts underwent rigorous preprocessing, including text cleaning and tokenization. We fine-tuned the pre-trained GPT-2 model on this dataset using the PyTorch and HuggingFace frameworks. The performance of PlasticSurgeryGPT was evaluated against the default GPT-2 model using BLEU, METEOR, and ROUGE-1 metrics.
Results: The fine-tuned model, named PlasticSurgeryGPT, demonstrated substantial improvements over the generic GPT-2 model in capturing the semantic nuances of plastic surgery text. PlasticSurgeryGPT outperformed GPT-2 across BLEU, METEOR, and ROUGE-1 metrics, with scores of 0.135519, 0.583554, and 0.216813, respectively, compared to GPT-2's scores of 0.130179, 0.550498, and 0.215494.
Conclusions: PlasticSurgeryGPT represents the first plastic surgery-specific LLM, demonstrating enhanced performance in generating relevant and accurate content compared to a general-purpose model. This work underscores the potential of domain-specific LLMs in improving clinical practice, surgical education, and research in plastic surgery. Future studies should focus on incorporating full-text articles, multimodal data, and larger models to further enhance performance and applicability.
期刊介绍:
Aesthetic Surgery Journal is a peer-reviewed international journal focusing on scientific developments and clinical techniques in aesthetic surgery. The official publication of The Aesthetic Society, ASJ is also the official English-language journal of many major international societies of plastic, aesthetic and reconstructive surgery representing South America, Central America, Europe, Asia, and the Middle East. It is also the official journal of the British Association of Aesthetic Plastic Surgeons, the Canadian Society for Aesthetic Plastic Surgery and The Rhinoplasty Society.