{"title":"Agent-guided AI-powered interpretation and reporting of nerve conduction studies and EMG (INSPIRE)","authors":"Alon Gorenshtein , Moran Sorka , Mohamed Khateb , Dvir Aran , Shahar Shelly","doi":"10.1016/j.clinph.2025.2110792","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>We aimed to create a tool for electrophysiologist enhancing and standardizing interpretation of neuromuscular electrodiagnostic tests (EDX) using state of the art generative AI technology.</div></div><div><h3>Methods</h3><div>We developed three model frameworks for interpreting and reporting EDX: (1) Base-LLM (large language model), employing one-shot inference; (2) INSPIRE (Agent-Guided AI-Powered Interpretation and Reporting of Nerve Conduction Studies and EMG), a multi-agent AI framework; and (3) INSPIRE-Lite, a cost-efficient version of INSPIRE. INSPIRE uses three agents integrating tools to read reference tables and long-context clinical neuromuscular textbook. Performance was evaluated using the AI-Generated EMG Report Score (AIGERS), a scoring system we developed.</div></div><div><h3>Results</h3><div>INSPIRE achieved an accuracy of 92.2 % for detecting normal versus abnormal tests, significantly outperforming the Base-LLM model, which achieved 62.6 % (p < 0.001). INSPIRE demonstrated significantly higher AIGERS scores overall and across the domains of finding, clinical diagnosis, and semantic concordance (p < 0.001). INSPIRE-Lite scored lower than INSPIRE in finding and clinical diagnosis (p = 0.001 and p = 0.004).</div></div><div><h3>Conclusion</h3><div>Our model integrates variables like patient medical history, current complaints, and EDX findings to manage and interpret EMG. Demonstrating superior performance while addressing hallucinations, data overload, and aiding prioritization and standardization.</div></div><div><h3>Significance</h3><div>This model enables comprehensive analysis by integrating diverse clinical variables, enhancing diagnostic accuracy and efficiency of EDX reports.</div></div>","PeriodicalId":10671,"journal":{"name":"Clinical Neurophysiology","volume":"177 ","pages":"Article 2110792"},"PeriodicalIF":3.7000,"publicationDate":"2025-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Neurophysiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1388245725006443","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective
We aimed to create a tool for electrophysiologist enhancing and standardizing interpretation of neuromuscular electrodiagnostic tests (EDX) using state of the art generative AI technology.
Methods
We developed three model frameworks for interpreting and reporting EDX: (1) Base-LLM (large language model), employing one-shot inference; (2) INSPIRE (Agent-Guided AI-Powered Interpretation and Reporting of Nerve Conduction Studies and EMG), a multi-agent AI framework; and (3) INSPIRE-Lite, a cost-efficient version of INSPIRE. INSPIRE uses three agents integrating tools to read reference tables and long-context clinical neuromuscular textbook. Performance was evaluated using the AI-Generated EMG Report Score (AIGERS), a scoring system we developed.
Results
INSPIRE achieved an accuracy of 92.2 % for detecting normal versus abnormal tests, significantly outperforming the Base-LLM model, which achieved 62.6 % (p < 0.001). INSPIRE demonstrated significantly higher AIGERS scores overall and across the domains of finding, clinical diagnosis, and semantic concordance (p < 0.001). INSPIRE-Lite scored lower than INSPIRE in finding and clinical diagnosis (p = 0.001 and p = 0.004).
Conclusion
Our model integrates variables like patient medical history, current complaints, and EDX findings to manage and interpret EMG. Demonstrating superior performance while addressing hallucinations, data overload, and aiding prioritization and standardization.
Significance
This model enables comprehensive analysis by integrating diverse clinical variables, enhancing diagnostic accuracy and efficiency of EDX reports.
期刊介绍:
As of January 1999, The journal Electroencephalography and Clinical Neurophysiology, and its two sections Electromyography and Motor Control and Evoked Potentials have amalgamated to become this journal - Clinical Neurophysiology.
Clinical Neurophysiology is the official journal of the International Federation of Clinical Neurophysiology, the Brazilian Society of Clinical Neurophysiology, the Czech Society of Clinical Neurophysiology, the Italian Clinical Neurophysiology Society and the International Society of Intraoperative Neurophysiology.The journal is dedicated to fostering research and disseminating information on all aspects of both normal and abnormal functioning of the nervous system. The key aim of the publication is to disseminate scholarly reports on the pathophysiology underlying diseases of the central and peripheral nervous system of human patients. Clinical trials that use neurophysiological measures to document change are encouraged, as are manuscripts reporting data on integrated neuroimaging of central nervous function including, but not limited to, functional MRI, MEG, EEG, PET and other neuroimaging modalities.