Sean A Setzen, Katerina Andreadis, Olivier Elemento, Anaïs Rameau
{"title":"AI-Powered Laryngoscopy: Exploring the Future With Google Gemini.","authors":"Sean A Setzen, Katerina Andreadis, Olivier Elemento, Anaïs Rameau","doi":"10.1002/lary.32089","DOIUrl":null,"url":null,"abstract":"<p><p>Foundation models (FMs) are general-purpose artificial intelligence (AI) neural networks trained on massive datasets, including code, text, audio, images, and video, to handle myriad tasks from generating texts to analyzing images or composing music. We evaluated Google Gemini 1.5 Pro, currently the largest token context window multimodal FM and best-performing commercial model for video analysis, for interpreting laryngoscopy frames and videos from Google Images and YouTube. Gemini recognized the procedure as laryngoscopy in 87/88 frames (98.9%) and in 15/15 video-laryngoscopies (100%), accurately diagnosed a pathology in 55/88 frames (62.5%) and 3/15 videos (20.0%), identified lesion sides in 58/88 frames (65.9%) and 6/15 videos (40%) and narrated two operative video-laryngoscopies without fine-tuning. Findings suggest that Gemini 1.5 Pro shows significant potential for analyzing laryngoscopy, demonstrating the potential for FMs as clinical decision support tools in complex expert tasks in otolaryngology. LEVEL OF EVIDENCE: 3.</p>","PeriodicalId":49921,"journal":{"name":"Laryngoscope","volume":" ","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Laryngoscope","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/lary.32089","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Foundation models (FMs) are general-purpose artificial intelligence (AI) neural networks trained on massive datasets, including code, text, audio, images, and video, to handle myriad tasks from generating texts to analyzing images or composing music. We evaluated Google Gemini 1.5 Pro, currently the largest token context window multimodal FM and best-performing commercial model for video analysis, for interpreting laryngoscopy frames and videos from Google Images and YouTube. Gemini recognized the procedure as laryngoscopy in 87/88 frames (98.9%) and in 15/15 video-laryngoscopies (100%), accurately diagnosed a pathology in 55/88 frames (62.5%) and 3/15 videos (20.0%), identified lesion sides in 58/88 frames (65.9%) and 6/15 videos (40%) and narrated two operative video-laryngoscopies without fine-tuning. Findings suggest that Gemini 1.5 Pro shows significant potential for analyzing laryngoscopy, demonstrating the potential for FMs as clinical decision support tools in complex expert tasks in otolaryngology. LEVEL OF EVIDENCE: 3.
期刊介绍:
The Laryngoscope has been the leading source of information on advances in the diagnosis and treatment of head and neck disorders since 1890. The Laryngoscope is the first choice among otolaryngologists for publication of their important findings and techniques. Each monthly issue of The Laryngoscope features peer-reviewed medical, clinical, and research contributions in general otolaryngology, allergy/rhinology, otology/neurotology, laryngology/bronchoesophagology, head and neck surgery, sleep medicine, pediatric otolaryngology, facial plastics and reconstructive surgery, oncology, and communicative disorders. Contributions include papers and posters presented at the Annual and Section Meetings of the Triological Society, as well as independent papers, "How I Do It", "Triological Best Practice" articles, and contemporary reviews. Theses authored by the Triological Society’s new Fellows as well as papers presented at meetings of the American Laryngological Association are published in The Laryngoscope.
• Broncho-esophagology
• Communicative disorders
• Head and neck surgery
• Plastic and reconstructive facial surgery
• Oncology
• Speech and hearing defects