AI-Powered Laryngoscopy: Exploring the Future With Google Gemini.

IF 2.2 3区医学 Q3 MEDICINE, RESEARCH & EXPERIMENTAL

Laryngoscope Pub Date : 2025-02-20 DOI:10.1002/lary.32089

Sean A Setzen, Katerina Andreadis, Olivier Elemento, Anaïs Rameau

引用次数: 0

Abstract

Foundation models (FMs) are general-purpose artificial intelligence (AI) neural networks trained on massive datasets, including code, text, audio, images, and video, to handle myriad tasks from generating texts to analyzing images or composing music. We evaluated Google Gemini 1.5 Pro, currently the largest token context window multimodal FM and best-performing commercial model for video analysis, for interpreting laryngoscopy frames and videos from Google Images and YouTube. Gemini recognized the procedure as laryngoscopy in 87/88 frames (98.9%) and in 15/15 video-laryngoscopies (100%), accurately diagnosed a pathology in 55/88 frames (62.5%) and 3/15 videos (20.0%), identified lesion sides in 58/88 frames (65.9%) and 6/15 videos (40%) and narrated two operative video-laryngoscopies without fine-tuning. Findings suggest that Gemini 1.5 Pro shows significant potential for analyzing laryngoscopy, demonstrating the potential for FMs as clinical decision support tools in complex expert tasks in otolaryngology. LEVEL OF EVIDENCE: 3.

查看原文本刊更多论文

人工智能喉镜：与谷歌Gemini一起探索未来。

基础模型（FMs）是一种通用的人工智能（AI）神经网络，它在大量数据集（包括代码、文本、音频、图像和视频）上进行训练，以处理从生成文本到分析图像或作曲的无数任务。我们评估了谷歌Gemini 1.5 Pro，它是目前最大的token上下文窗口多模态调频，也是视频分析中性能最好的商业模型，用于解释来自谷歌Images和YouTube的喉镜框架和视频。Gemini在87/88帧（98.9%）和15/15视频（100%）中将手术识别为喉镜检查，在55/88帧（62.5%）和3/15视频（20.0%）中准确诊断出病变部位，在58/88帧（65.9%）和6/15视频（40%）中确定病变部位，并叙述了两次无微调的手术喉镜检查。研究结果表明，Gemini 1.5 Pro在分析喉镜方面显示出巨大的潜力，表明FMs在耳鼻喉科复杂专家任务中作为临床决策支持工具的潜力。证据等级：3。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Laryngoscope 医学-耳鼻喉科学

CiteScore

6.50

自引率

7.70%

发文量

500

审稿时长

2-4 weeks

期刊介绍： The Laryngoscope has been the leading source of information on advances in the diagnosis and treatment of head and neck disorders since 1890. The Laryngoscope is the first choice among otolaryngologists for publication of their important findings and techniques. Each monthly issue of The Laryngoscope features peer-reviewed medical, clinical, and research contributions in general otolaryngology, allergy/rhinology, otology/neurotology, laryngology/bronchoesophagology, head and neck surgery, sleep medicine, pediatric otolaryngology, facial plastics and reconstructive surgery, oncology, and communicative disorders. Contributions include papers and posters presented at the Annual and Section Meetings of the Triological Society, as well as independent papers, "How I Do It", "Triological Best Practice" articles, and contemporary reviews. Theses authored by the Triological Society’s new Fellows as well as papers presented at meetings of the American Laryngological Association are published in The Laryngoscope. • Broncho-esophagology • Communicative disorders • Head and neck surgery • Plastic and reconstructive facial surgery • Oncology • Speech and hearing defects