The use of generative artificial intelligence-based dictation in a neurosurgical practice: a pilot study.

IF 3 2区 医学 Q2 CLINICAL NEUROLOGY
Benjamin S Hopkins, Jonathan Dallas, James Yu, Robert G Briggs, Lawrance K Chung, David J Cote, David Gomez, Ishan Shah, John D Carmichael, John C Liu, William J Mack, Gabriel Zada
{"title":"The use of generative artificial intelligence-based dictation in a neurosurgical practice: a pilot study.","authors":"Benjamin S Hopkins, Jonathan Dallas, James Yu, Robert G Briggs, Lawrance K Chung, David J Cote, David Gomez, Ishan Shah, John D Carmichael, John C Liu, William J Mack, Gabriel Zada","doi":"10.3171/2025.4.FOCUS24834","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Document dictation remains a significant clinical burden and generative artificial intelligence (AI) systems utilizing transformer-based technology offer efficient speech processing methods that could streamline clinical documentation. This study aimed to evaluate the potential of generative AI in enhancing dictation efficiency and workflow within a targeted neurosurgical practice.</p><p><strong>Methods: </strong>Ten operative reports from both cranial and spinal neurosurgical procedures were dictated and recorded by three independent physicians. The audio files were processed by 1) a modified speech-to-text model implemented based on a backbone architecture created by OpenAI's Whisper model and 2) Nuance's Dragon Medical One as a comparative commercial standard. Word error rate (WER) was manually reviewed.</p><p><strong>Results: </strong>The mean WER was 1.75% for Whisper and 1.54% for Dragon (p = 0.080). When excluding linguistic errors, Whisper outperformed Dragon with a mean WER of 0.50% versus 1.34% (p < 0.001), including the mean number of total errors (Whisper: 6.1, Dragon: 9.7; p = 0.002). For all unstratified dictations, a positive correlation was seen between total errors and word count (p < 0.001, R2 = 0.37), as well as total errors and recording length (p < 0.001, R2 = 0.22). A positive correlation was noted between words spoken per second and total errors for Dragon (p = 0.020, R2 = 0.18), but not for Whisper (p = 0.205, R2 = 0.06). Similarly, when analyzing linguistic errors only, this trend held for Dragon (p = 0.014, R2 = 0.20), but not for Whisper (p = 0.331, R2 = 0.03).</p><p><strong>Conclusions: </strong>An AI-based model performed at a noninferior rate compared to a commercially available speech-to-text dictation program. Generative models provide potential benefits such as contextual inference that show promise in limiting errors with increased dictation speed or adjustment for impure input data.</p>","PeriodicalId":19187,"journal":{"name":"Neurosurgical focus","volume":"59 1","pages":"E8"},"PeriodicalIF":3.0000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurosurgical focus","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3171/2025.4.FOCUS24834","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: Document dictation remains a significant clinical burden and generative artificial intelligence (AI) systems utilizing transformer-based technology offer efficient speech processing methods that could streamline clinical documentation. This study aimed to evaluate the potential of generative AI in enhancing dictation efficiency and workflow within a targeted neurosurgical practice.

Methods: Ten operative reports from both cranial and spinal neurosurgical procedures were dictated and recorded by three independent physicians. The audio files were processed by 1) a modified speech-to-text model implemented based on a backbone architecture created by OpenAI's Whisper model and 2) Nuance's Dragon Medical One as a comparative commercial standard. Word error rate (WER) was manually reviewed.

Results: The mean WER was 1.75% for Whisper and 1.54% for Dragon (p = 0.080). When excluding linguistic errors, Whisper outperformed Dragon with a mean WER of 0.50% versus 1.34% (p < 0.001), including the mean number of total errors (Whisper: 6.1, Dragon: 9.7; p = 0.002). For all unstratified dictations, a positive correlation was seen between total errors and word count (p < 0.001, R2 = 0.37), as well as total errors and recording length (p < 0.001, R2 = 0.22). A positive correlation was noted between words spoken per second and total errors for Dragon (p = 0.020, R2 = 0.18), but not for Whisper (p = 0.205, R2 = 0.06). Similarly, when analyzing linguistic errors only, this trend held for Dragon (p = 0.014, R2 = 0.20), but not for Whisper (p = 0.331, R2 = 0.03).

Conclusions: An AI-based model performed at a noninferior rate compared to a commercially available speech-to-text dictation program. Generative models provide potential benefits such as contextual inference that show promise in limiting errors with increased dictation speed or adjustment for impure input data.

基于生成式人工智能的听写在神经外科实践中的应用:一项试点研究。
目的:文档口述仍然是一个重要的临床负担,利用基于变压器技术的生成式人工智能(AI)系统提供了有效的语音处理方法,可以简化临床文档。本研究旨在评估生成式人工智能在提高目标神经外科实践中的听写效率和工作流程方面的潜力。方法:由3名独立医师口述并记录10例颅、脊神经外科手术报告。音频文件通过1)基于OpenAI的Whisper模型和2)Nuance的Dragon Medical One(作为比较的商业标准)创建的骨干架构实现的修改后的语音到文本模型进行处理。手动检查单词错误率(WER)。结果:Whisper和Dragon的平均WER分别为1.75%和1.54% (p = 0.080)。当排除语言错误时,Whisper的平均WER为0.50%,优于Dragon的1.34% (p < 0.001),包括平均总错误数(Whisper: 6.1, Dragon: 9.7;P = 0.002)。对于所有非分层听写,总错误与字数(p < 0.001, R2 = 0.37)以及总错误与记录长度(p < 0.001, R2 = 0.22)呈正相关。每秒钟说的字数与“龙”的总错误之间存在正相关(p = 0.020, R2 = 0.18),而“耳语”则不存在正相关(p = 0.205, R2 = 0.06)。同样,当只分析语言错误时,这一趋势适用于Dragon (p = 0.014, R2 = 0.20),但不适用于Whisper (p = 0.331, R2 = 0.03)。结论:与商业上可用的语音到文本听写程序相比,基于人工智能的模型的执行速度并不逊色。生成模型提供了潜在的好处,例如上下文推理,它有望通过提高听写速度或调整不纯输入数据来限制错误。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neurosurgical focus
Neurosurgical focus CLINICAL NEUROLOGY-SURGERY
CiteScore
6.30
自引率
0.00%
发文量
261
审稿时长
3 months
期刊介绍: Information not localized
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信